bsz Search Results - Githubissues

AFDWang/Hetu-Galvatron #4

qusetion about the estimation result

I run the scripts/search_dist.sh of llama_hf model using A800 with 8 gpus `export NUM_NODES=1 export NUM_GPUS_PER_NODE=8 MODEL_SIZE="llama-13b" MEMORY=75 MODEL_ARGS=" --model_size ${MODE…

hailuoS updated 1 week ago

open-mmlab/mmdetection #11803

vlfuse_helper中的attention_mask_l用法

请教一个问题，在vlfuse_helper.py中的BiMultiHeadAttention类中的forward函数中，对于attention_mask_l的用法为什么和https://github.com/IDEA-Research/GroundingDINO这里面的不一致呢？ “”“ if attention_mask_l is not None: assert …

HandsLing updated 4 weeks ago

HobbitLong/SupContrast #112

f1, f2 = torch.split(features, [bsz, bsz], dim=0)

f1, f2 = torch.split(features, [bsz, bsz], dim=0) RuntimeError: start (4) + length (4) exceeds dimension size (4).

onlinehuazai updated 8 months ago

ubtue/tuefind #2894

Neues Selektionskriterium für IxBib

Bitte zusätzlich neben "BIIN" für den Index Biblicus auch "BiBIL" berücksichtigen: BiBIL | 0575 | 935 $a[Wert] https://github.com/ubtue/tuefind/wiki/Daten-Abzugs--und-Selektionskriterien#selektion…

IxTheoKm updated 1 day ago

NUST-Machine-Intelligence-Laboratory/FECANET #1

what does --bsz mean?

hello ,i'm reading your paper and running your code,i want to know what does --bsz mean? looking forward to your reply

DreamerrW updated 1 year ago

meta-llama/llama #1043

Failed to run example_chat_completion.py because AssertionEr…

**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the [FAQs](https://ai.meta.com/llama/faq/) and [existing/past issues](https://github.com/facebo…

snowymo updated 2 weeks ago

huggingface/transformers #31036

sdpa for bert should support 4D attention mask.

### Feature request Currently this is only possible with 2D-mask when sdap is enabled. ```python # modeling bert # Expand the attention mask if use_sdpa_attention_masks: …

Leoyzen updated 1 month ago

TideDra/VL-RLHF #10

微调qwen爆内存

您好，使用原始代码在2张A100 80G上面微调qwen，显存占用两张卡上都只有919M，但是在数据加载过程中？内存占用一直在增加，直到180多G后内存爆了，程序终止。请问这个问题怎么解？训练log： ![image](https://github.com/TideDra/VL-RLHF/assets/36758049/09277b55-ea0a-4cfd-875b-792f457441a2…

delian11 updated 1 week ago

da03/Internalize_CoT_Step_by_Step #2

Could you provide the command to reproduce the results on GS…

Thanks for this amazing work! Could you provide the command to reproduce the results on GSM8k?

Ber666 updated 1 week ago

huggingface/parler-tts #62

attention_mask

hi, I have attention_mask problem mismatch in the cross attenstion can you please explain this line: requires_attention_mask = "encoder_outputs" not in model_kwargs ? why is comed after this: …

netagl updated 1 month ago

1000+ results for bsz

1000+ results
for bsz