I'm trying to load the 7B quantized model (which I quantized using the script in this repository) using NVIDIA TITAN Xp. But I get the following errors.
this one with triton=2.1.0:
CUDA extension not installed.
Loading model ...
QuantLinear Warmup: Found 4 unique KN values.
FusedMLP Warmup: Found 0 unique K values.
Warming up autotune cache ...
0%| | 0/12 [00:00<?, ?it/s]
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: invalid element type in packLLEElements. Expected 'f32' but got 'f16'
loc(fused["/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:17, "/home/guest1/anaconda3/envs/exllama/lib/python3.9/site-packages/gptq_triton/quant_linear.py":172:27]): error: 'llvm.intr.fmuladd' op requires the same type for all operands and results
Pass execution failedLLVM ERROR: Failed to translate TritonGPU to LLVM IR.
test.sh: line 3: 751380 Aborted (core dumped) ./benchmark_generate.py --model save --quant --average 1
this one with triton=3.0.0:
CUDA extension not installed.
Loading model ...
QuantLinear Warmup: Found 4 unique KN values.
FusedMLP Warmup: Found 0 unique K values.
Warming up autotune cache ...
0%| | 0/12 [00:00<?, ?it/s]
Unsupported conversion from f16 to f16
LLVM ERROR: Unsupported rounding mode for conversion.
test.sh: line 2: 749267 Aborted (core dumped) ./benchmark_generate.py --model save --quant --average 1
What should I do to run the script? Or need more information? Thank you for your effort in this repo.
I'm trying to load the 7B quantized model (which I quantized using the script in this repository) using NVIDIA TITAN Xp. But I get the following errors. this one with triton=2.1.0:
this one with triton=3.0.0:
What should I do to run the script? Or need more information? Thank you for your effort in this repo.