-
I run the scripts/search_dist.sh of llama_hf model using A800 with 8 gpus
`export NUM_NODES=1
export NUM_GPUS_PER_NODE=8
MODEL_SIZE="llama-13b"
MEMORY=75
MODEL_ARGS="
--model_size ${MODE…
-
请教一个问题,在vlfuse_helper.py中的BiMultiHeadAttention类中的forward函数中,对于attention_mask_l的用法为什么和https://github.com/IDEA-Research/GroundingDINO这里面的不一致呢?
“”“
if attention_mask_l is not None:
assert …
-
f1, f2 = torch.split(features, [bsz, bsz], dim=0)
RuntimeError: start (4) + length (4) exceeds dimension size (4).
-
Bitte zusätzlich neben "BIIN" für den Index Biblicus auch "BiBIL" berücksichtigen:
BiBIL | 0575 | 935 $a[Wert]
https://github.com/ubtue/tuefind/wiki/Daten-Abzugs--und-Selektionskriterien#selektion…
-
hello ,i'm reading your paper and running your code,i want to know what does --bsz mean?
looking forward to your reply
-
**Before submitting a bug, please make sure the issue hasn't been already addressed by searching through the [FAQs](https://ai.meta.com/llama/faq/) and [existing/past issues](https://github.com/facebo…
-
### Feature request
Currently this is only possible with 2D-mask when sdap is enabled.
```python
# modeling bert
# Expand the attention mask
if use_sdpa_attention_masks:
…
-
您好,使用原始代码在2张A100 80G上面微调qwen,显存占用两张卡上都只有919M,但是在数据加载过程中?内存占用一直在增加,直到180多G后内存爆了,程序终止。请问这个问题怎么解?
训练log:
![image](https://github.com/TideDra/VL-RLHF/assets/36758049/09277b55-ea0a-4cfd-875b-792f457441a2…
-
Thanks for this amazing work! Could you provide the command to reproduce the results on GSM8k?
-
hi, I have attention_mask problem mismatch in the cross attenstion
can you please explain this line:
requires_attention_mask = "encoder_outputs" not in model_kwargs ?
why is comed after this:
…