TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
Apache License 2.0
Phi-3-mini-128k error #2313

Closed scuizhibin closed 1 week ago

scuizhibin commented 2 weeks ago

envirmonent: hardware: rtx4090 Driver Version: 550.107.02 software: cuda release 12.4, V12.4.131
When I quantify the Phi3-min-128k model, I use two commands 一、Command 1: python3 ../TensorRT-LLM/examples/quantization/quantize.py --model_dir ./Phi-3-mini-128k-instruct/ --output_dir ./phi_out/ --dtype float16 --qformat fp8 --kv_cache_dtype fp8 ** Terminal output:Special tokens have been added in the vocabulary, make sure the associated word embeddings are fine-tuned or trained. /usr/local/lib/python3.10/dist-packages/datasets/table.py:1421: FutureWarning: promote has been superseded by promote_options='default'. table = cls._concat_blocks(blocks, axis=0) Inserted 387 quantizers /usr/local/lib/python3.10/dist-packages/modelopt/torch/quantization/model_quant.py:131: DeprecationWarning: forward_loop should take model as argument, but got forward_loop without any arguments. This usage will be deprecated in future versions. return calibrate(model, config["algorithm"], forward_loop=forward_loop) [10/10/2024-10:11:33] You are not running the flash-attention implementation, expect numerical differences. current rank: 0, tp rank: 0, pp rank: 0 /usr/lib/python3.10/tempfile.py:1008: ResourceWarning: Implicitly cleaning up <TemporaryDirectory '/tmp/tmp481ehvj0'> _warnings.warn(warn_message, ResourceWarning)

二、Command 2: trtllm-build --checkpoint_dir ./phi_out/ --output_dir ./phi_engine/ --gemm_plugin auto --max_batch_size 8 --max_input_len 1024 --max_seq_len 2048

** Terminal output: File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/functional.py", line 1223, in slice input_ndim = input.ndim() AttributeError: 'NoneType' object has no attribute 'ndim'

how to solve this error ?

nv-guomingz commented 2 weeks ago

Thanks @scuizhibin for reporting such issue. I can reproduce it on my local side.

Here is a quick war for fixing this issue, please update your ./phi_out/config.json by replacing position_embedding_type field value from rope_gpt_neox to long_rope.


Superjomn commented 1 week ago

Close since no recent update, please feel free to reopen it later.