evoformer Search Results

239 results
for evoformer

Best match

Best match Most commented Newest Recently updated Least commented Oldest Least recently updated

microsoft/DeepSpeed #4864

[BUG] Mixtral inference OOM

**Describe the bug** I'm not sure if DeepSpeed needs to be adapted for Mixtral. When I tried using DeepSpeed inference for model inference, it didn't properly implement model parallelism. Instead, it…

ShayDuane updated 3 months ago
3
pytorch/pytorch #91653

Stochastic Illegal Memory Access error mid-epoch on AWS p4d …

### 🐛 Describe the bug While training on a large cluster of AWS p4d instances with PyTorch, PyTorch Lightning, and DeepSpeed, we observe an IllegalMemoryAccess error that happens stochastically at …

sachinkadyan7 updated 1 year ago
9
microsoft/DeepSpeed #5242

[BUG] grad_norm and loss is nan when deepspeed==0.13.5 but o…

**Describe the bug** when fine-tuning my model using deepspeed==0.13.5, and huggingface trainer, loss and grad_norm will be nan at step 2 ![image](https://github.com/microsoft/DeepSpeed/assets/29994…

Chandler-Bing updated 2 months ago
24
hpcaitech/FastFold #9

RuntimeError: CUDA error: no kernel image is available for e…

How can i fix this error? I ran the command: torchrun --nproc_per_node=1 perf.py --msa-length 128 --res-length 256. Then the following error appeared. The versions of Pytorch, Python, and CUDA are …

Wolverinerine updated 2 years ago
1
microsoft/DeepSpeed-MII #273

Unable to load ragged_device_ops op due to no compute capabi…

I get this error following the deepspeed-fastgen instructions: ```python from mii import pipeline pipe = pipeline("mistralai/Mistral-7B-v0.1") ``` The full stack trace is: ``` Loading ext…

rogerbock updated 3 months ago
10
bjing2016/alphaflow #11

Dockerfile or CUDA 12

Hi, Thanks for the wonderful work. I am planning on doing some conformation sampling using this work, but unfortunately it seems like the hard requirement of CUDA 11.6 is an issue. I've tried diffe…

Chokyotager updated 1 week ago
3
microsoft/DeepSpeed #4901

[BUG] ZERO++ | AssertionError: ZeRO parameter intra parallel…

**Describe the bug** Hello. I'm an active user of deepspeed for multi-node training. I've always used zero3, but this time I tried attaching the hpz feature of zero++ for the first time. The issue…

dhkim0225 updated 1 month ago
3
hpcaitech/FastFold #182

RuntimeError: CUDA error: CUBLAS_STATUS_INVALID_VALUE when c…

Greetings! Following the instructions, I've completed an installation and everything seemed to work including the generation of the MSA. Specifically, I've done the recommended conda installation,…

cdsnow updated 5 months ago
2
microsoft/DeepSpeed-MII #452

inference_core_ops.so: undefined symbol: _Z19cuda_wf6af16_li…

**Environment:** Ubuntu 22.04.4 LTS Cuda compilation tools, release 12.1, V12.1.66 Build cuda_12.1.r12.1/compiler.32415258_0 ds_report added at the end of the description **Issue:** Not able to…

Andronixs updated 5 months ago
6
microsoft/DeepSpeed-MII #250

DeepSpeed bug multi-gpu in single node

I have code that run on g4dn.12xlarge with Nvidia T4 GPU (4 x 16 GB) ~ 64 GB. ## Issue My code: ``` python import mii model = "meta-llama/Llama-2-13b-chat-hf" mii_configs = { "ten…

muhammad-asn updated 10 months ago
1

上一页 1...3 4 5 6 7 8 9...24 下一页

239 results for evoformer

239 results
for evoformer