MilaNLProc / simple-generation

A python package to run inference with HuggingFace language and vision-language checkpoints wrapping many convenient features.
Other
25 stars 2 forks source link

RuntimeError: CUDA error: Unspecified Launch Failure During Generation #2

Closed donya-rooein closed 1 month ago

donya-rooein commented 6 months ago

When attempting to generate text using a CUDA-enabled model, an unspecified launch failure CUDA error occurs. This error halts the generation process, leading to incomplete or failed batches.

checkpoint = "meta-llama/Llama-2-70b-chat-hf" responses= generator(texts, apply_chat_template=True, skip_prompt=True, batch_size="auto", temperature=0.0, max_new_tokens=256)

output: Generation: 0%| | 0/96 [00:00<?, ?it/s] Error CUDA error: unspecified launch failure CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect. For debugging consider passing CUDA_LAUNCH_BLOCKING=1. Compile with "TORCH_USE_CUDA_DSA" to enable device-side assertions. Generation failed. Skipping batch.

g8a9 commented 6 months ago

Hello Donya! It seems an error related to DDP. How are you executing the script? Can you also show me the run command? Is it a python ..., accelerate ... or torchrun ...?

g8a9 commented 6 months ago

Also, I'm noticing you are loading the model without any quantization or dtype specified. That means you are loading the weights in full scale and probably running OOM. Can you try adding to the loader, for example, torch_dtype=torch.bfloat16?

g8a9 commented 6 months ago

Hello @donya-rooein, did any of the fixes work fine?

github-actions[bot] commented 1 month ago

Stale issue message