evoformer Search Results

239 results
for evoformer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeed #5779

how to set "training_step" during training?

**Describe the bug** I use zero1 to train a unet network with the following deepspeed_config configuration. I set 10 epochs and the output during training is as follows: ```json { "train_m…

qwerfdsadad updated 3 weeks ago
2
microsoft/DeepSpeed #4845

[BUG] Always get errno: 110 - Connection timed out when usin…

**Describe the bug** I'm trying to use deepspeed to finetune a bert based classification model, but when trying to launch multi-node training all nodes include localhost get errno: 110 - Connection t…

Luoyang144 updated 1 month ago
5
aqlaboratory/openfold #132

Training Runtime Error: StopIteration

Hi, I'm using the released training data on AWS and the latest main branch to train the model. 1. The directory structure of the released data is not recognized by the code. 2. After re-struc…

bozhang-hpc updated 2 years ago
36
microsoft/DeepSpeed #4874

[BUG] Deepspeed MultiGpu inference not working with `Llama-2…

**Describe the bug** I was trying to run an inference with DeepSpeed on the Llama model, but when I ran `deepspeed --num_gpus 4 script.py`, the process terminated automatically after loading the ch…

Rishabhg71 updated 7 months ago
2
microsoft/DeepSpeed #4945

[BUG] Step 3 with ZeRO=3 see error: RuntimeError: CUDA error…

**Describe the bug** A clear and concise description of what the bug is. Please include which training step you are using and which model you are training. Training Step: 3-RLHF Training model: act…

N33MO updated 8 months ago
1
microsoft/DeepSpeed #4533

[BUG]`assert param.ds_status == ZeroParamStatus.AVAILABLE, p…

**Describe the bug** When training Deepspeed-Chat Step3 with **ZeRO3**(without hybrid-engine), if we set `generation_batches >= 3` or `generation_batches >= 2 and ppo_epochs >= 2`, deepspeed will rai…

GoSz updated 8 months ago
2
microsoft/DeepSpeed #5195

[BUG] ValueError: `.to` is not supported for `4-bit` or `8-b…

**Describe the bug** Loading the llama2 70b model using 4 bit(bitstandbytes) and then distributed the model by calling deepspeed.initialize. Get the following error ``` ------------------------…

robinsonmhj updated 1 month ago
1
aqlaboratory/openfold #403

Build openfold with newer ppytorch + cuda

Right now openfold asks for old pytorch + cuda (11.2), thus latest linux is not able to build openfold. Would like to upgrade the supported pytorch + cuda and other python packages accordingly, so …

chunhui-shi updated 3 weeks ago
14
microsoft/DeepSpeed #4886

[BUG] DeepSpeed ZeRO++ features aren't working

**Describe the bug** DeepSpeed ZeRO++ features aren't working: 1. On a single node, passing `zero_hpz_partition_size` , `zero_quantized_gradients` , `zero_quantized_weights` leads to foward pass err…

pacman100 updated 8 months ago
1
microsoft/DeepSpeed #5640

Does deepspeed support aarch64?

I am seeing the following error when trying to run it on aarch64 machine with H100. Linux r8-u37 6.5.0-1019-nvidia-64k #19-Ubuntu SMP PREEMPT_DYNAMIC Tue May 7 12:54:40 UTC 2024 aarch64 aarch64 …

khayamgondal updated 1 month ago
7

上一页 1...6 7 8 9 10 11 12...24 下一页

239 results for evoformer

239 results
for evoformer