IST-DASLab / marlin

FP16xINT4 LLM inference kernel that can achieve near-ideal ~4x speedups up to medium batchsizes of 16-32 tokens.
Apache License 2.0
575 stars 45 forks source link

RuntimeError: CUDA error: an illegal instruction was encountered when runing test.py #30

Open MekkCyber opened 3 months ago

MekkCyber commented 3 months ago

Hello, When running python test.py I get the error :

===================================== ERROR: test_groups (main.Test)

Traceback (most recent call last): File "/fsx/mohamed/dev/marlin/test.py", line 155, in test_groups self.run_problem(m, n, k, *thread_shape, groupsize) File "/fsx/mohamed/dev/marlin/test.py", line 66, in run_problem torch.cuda.synchronize() File "/admin/home/mohamed_mekkouri/miniconda3/envs/exp/lib/python3.10/site-packages/torch/cuda/init.py", line 792, in synchronize return torch._C._cuda_synchronize() RuntimeError: CUDA error: an illegal instruction was encountered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

======================================= ERROR: test_k_stages_divisibility (main.Test)

Traceback (most recent call last): File "/fsx/mohamed/dev/marlin/test.py", line 80, in test_k_stages_divisibility self.run_problem(16, 2 * 256, k, 64, 256) File "/fsx/mohamed/dev/marlin/test.py", line 60, in run_problem A = torch.randn((m, k), dtype=torch.half, device=DEV) RuntimeError: CUDA error: an illegal instruction was encountered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

======================================== ERROR: test_tiles (main.Test)

Traceback (most recent call last): File "/fsx/mohamed/dev/marlin/test.py", line 75, in test_tiles self.run_problem(m, 2 * 256, 1024, thread_k, thread_n) File "/fsx/mohamed/dev/marlin/test.py", line 60, in run_problem A = torch.randn((m, k), dtype=torch.half, device=DEV) RuntimeError: CUDA error: an illegal instruction was encountered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.

=========================================== ERROR: test_very_few_stages (main.Test)

Traceback (most recent call last): File "/fsx/mohamed/dev/marlin/test.py", line 85, in test_very_few_stages self.run_problem(16, 2 * 256, k, 64, 256) File "/fsx/mohamed/dev/marlin/test.py", line 60, in run_problem A = torch.randn((m, k), dtype=torch.half, device=DEV) RuntimeError: CUDA error: an illegal instruction was encountered Compile with TORCH_USE_CUDA_DSA to enable device-side assertions.


Ran 6 tests in 0.794s

FAILED (errors=4)

the stack i am using : python 3.10.14 torch 2.3.1 cuda_12.1.r12.1 compute_cap 9.0

mgoin commented 2 months ago

It looks like you are on Hopper because of compute_cap 9.0. There is a known issue with Marlin on Hopper GPUs

MekkCyber commented 2 months ago

Yes it's Hopper, thank you !