bsz Search Results - Githubissues

1000+ results
for bsz

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

meta-llama/llama #482

torch.distributed.elastic.multiprocessing.errors.ChildFailed…

I downloaded the llama-2-7b and run the command as they metioned ``` torchrun --nproc_per_node 1 example_text_completion.py \ --ckpt_dir llama-2-7b/ \ --tokenizer_path tokenizer.model \ …

MDFARHYN updated 6 months ago
20
axinc-ai/ailia-models #1306

ADD distli-whisper

https://github.com/huggingface/distil-whisper

kyakuno updated 9 months ago
4
meta-llama/llama #827

AssertError: (6, 4)

![image](https://github.com/facebookresearch/llama/assets/82858160/9320de97-54fd-4bb5-b82f-c08e96c64b87) when i running `torchrun --nproc_per_node 1 example_chat_completion.py --ckpt_dir Llama-2-7b-c…

FelexTriz updated 11 months ago
3
dvlab-research/LongLoRA #84

why forward_flashattn didn't use roll function to roll the v…

##are these code used to roll in forward_flashattn x = rearrange(qkv, "b s three h d -> b s (three h d)") x_unpad, indices, cu_q_lens, max_s = unpad_input(x, key_padding_mask) cu_q_len_tmp = torch.…

linyubupa updated 10 months ago
2
microsoft/torchscale #49

RetNet: relative position

I believe there is a difference in relative position implemented here, and what is described in the paper. The issue I see is in [theta_shift and rotate_every_two](https://github.com/microsoft/torchs…

fkodom updated 11 months ago
5
svoop/aipp #28

Quoting error for LS SHOOT

As of 2023-08-13, a quoting error in the upstream CSV causes AIPP to fail: ``` ERROR: Illegal quoting in line 488. ``` The offending line: ```csv BSZ;"4203.080";20230822;0700;2000;"A - F";…

svoop updated 9 months ago
1
karpathy/nanoGPT #167

Loss becomes nan after training ~6000 iterations

Hi, I got the nan issue (as #136), even reducing learning rate to `1e-5`, after ~6000 iters. I'm not sure if this is caused by the dtype `float16` I used in my config. Any ideas why this is happening?…

holyseven updated 5 months ago
27
showlab/UniVTG #30

Training Detail for Fine-tuning？

![1698998777982](https://github.com/showlab/UniVTG/assets/36877347/169687c6-8fbb-433d-91e6-636d4231360a) Thanks for your work, some training details are not too clearly described in the readme, so I …

yhl2018 updated 10 months ago
12
huggingface/transformers #28206

ValueError: too many values to unpack (expected 2) when Fine…

### System Info - `transformers` version: 4.35.2 - Platform: Linux-6.1.58+-x86_64-with-glibc2.35 - Python version: 3.10.12 - Huggingface_hub version: 0.19.4 - Safetensors version: 0.4.1 - Acc…

0920GX updated 7 months ago
2
dvlab-research/LongLoRA #126

Help to confirm understanding of forward_flashattn

Dear Authors and @yukang2017 , Thanks for the amazing work. I am trying to understand the following: https://github.com/dvlab-research/LongLoRA/blob/2a33f37543038877c70e9a625a61dc72a71621d0/llama_…

weicheng113 updated 10 months ago
2

上一页 1...92 93 94 95 96 97 98...100 下一页

1000+ results for bsz

1000+ results
for bsz