Open gloritygithub11 opened 3 weeks ago
Hi @gloritygithub11, it looks like a issue during checkpoint conversion. Did you convert Qwen-7B-Instruct using the same command as the 72B one? If not, could you please give me the command you use. It could help me locate the issue. At the mean time, since you have 4 GPU, could you try python3 ./convert_checkpoint.py --model_dir /your_model_dir --output_dir /your_output_dir --dtype float16 --smoothquant 0.5 --tp_size 4
instead? This should work for your case.
Hi @jershi425,
I have the same command for 7b and 72b. Since I will run the model in a single gpu node, I can't build with tp_size 4
Hi @gloritygithub11 , sorry but currently we don't support single GPU deployment for the 72B model even with int8 SQ. It is hardly feasible because it requires 72GB + activations + KV caches + other buffers which will easily oom on single GPU.
System Info
tensorrt 10.2.0 tensorrt_llm 0.12.0.dev2024072301 A100-80G * 4
Who can help?
@Tracin
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
output:
output
output
Expected behavior
the output is not empty and make sense
actual behavior
output is empty
additional notes
I tried the same command with qwen2 7b instruct, which works properly.
output