NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.51k stars 965 forks source link

llama 3.2 checkpoint conversion fails #2302

Open stas00 opened 2 weeks ago

stas00 commented 2 weeks ago

edit: both 3.1 and 3.2 fail


the 3.1-specific repro:

git clone https://github.com/NVIDIA/TensorRT-LLM/
cd TensorRT-LLM/examples/llama
pip install -r requirements.txt
git clone https://huggingface.co/meta-llama/Llama-3.1-8B-Instruct
python convert_checkpoint.py --model_dir Llama-3.1-8B-Instruct \
>                             --output_dir Llama-3.1-8B-Instruct_2gpu_tp2 \
>                             --dtype float16 \
>                             --tp_size 2
[TensorRT-LLM] TensorRT-LLM version: 0.14.0.dev2024100800
0.14.0.dev2024100800
[10/09/2024-02:45:28] [TRT-LLM] [W] AutoConfig cannot load the huggingface config.
Traceback (most recent call last):
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 518, in <module>
    main()
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 510, in main
    convert_and_save_hf(args)
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 452, in convert_and_save_hf
    execute(args.workers, [convert_and_save_rank] * world_size, args)
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 459, in execute
    f(args, rank)
  File "/data/stas/faster-ttft/core/dawn/exp/infer/faster-ttft/trt-llm/TensorRT-LLM/examples/llama/convert_checkpoint.py", line 438, in convert_and_save_rank
    llama = LLaMAForCausalLM.from_hugging_face(
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/models/llama/model.py", line 322, in from_hugging_face
    config = LLaMAConfig.from_hugging_face(hf_config_or_dir,
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/tensorrt_llm/models/llama/config.py", line 101, in from_hugging_face
    hf_config = transformers.AutoConfig.from_pretrained(
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/models/auto/configuration_auto.py", line 989, in from_pretrained
    return config_class.from_dict(config_dict, **unused_kwargs)
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/configuration_utils.py", line 772, in from_dict
    config = cls(**config_dict)
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 161, in __init__
    self._rope_scaling_validation()
  File "/env/lib/conda/ctx-shared-trt-stable/lib/python3.10/site-packages/transformers/models/llama/configuration_llama.py", line 182, in _rope_scaling_validation
    raise ValueError(
ValueError: `rope_scaling` must be a dictionary with two fields, `type` and `factor`, 

This doc says https://github.com/NVIDIA/TensorRT-LLM/blob/main/examples/llama/README.md#llama-v3-updates v3 is supported, but clearly it's not. Probably it meant 3.0 and not 3.x? Not sure.

If you copied the HF llama modeling code you need to update it to the latest version for this to work.

aabbhishekksr commented 2 weeks ago

I am also facing same issue

jinxiangshi commented 2 weeks ago

pip install transformers==4.43.2

Superjomn commented 1 week ago

Close since no recent update, please feel free to reopen this issue if needed.

Superjomn commented 1 week ago

The support for Llama 3.2 is not ready yet, please wait for the following releases.

stas00 commented 1 week ago

As the OP communicates your documentation says that v3 is supported - so you probably need to change it to be more specific - i.e. only v3.0 is supported and not v3.x

stas00 commented 1 week ago

And I also don't understand why did you close this?

Close since no recent update, please feel free to reopen this issue if needed.

Update from whom? You are swiping the issue under the carpet.

The user can't reopen the issue so your suggestion can't work.

laikhtewari commented 1 week ago

Hi @stas00, thank you for raising this issue!

TensorRT-LLM doesn't support Llama 3.2 (yet -- coming soon!), though I suspect from the code snippet shared, the question is about Llama 3.1 which is supported

To run Llama 3.1, you can manually upgrade the transformers version after installing TensorRT-LLM: pip install transformers==4.43.2 (thanks for sharing the workaround @Superjomn)

Please let me know if you have further issues. We are working on upgrading the default transformers dependency to remove this manual step.

I will update the documentation to specify which Llama 3.x versions are supported, and I'll figure out why you don't have permissions to re-open an issue

stas00 commented 1 week ago

Thank you for the follow up, @laikhtewari

I see the confusion now - I think I tested with both 3.1 and 3.2 and both were failing but the issue I created said 3.2 and repro I listed 3.1 - my bad!

And as @jinxiangshi suggested (not @Superjomn) - the 3.1 issue is fixable by manual transformers update and you said that v3.2 isn't supported yet and you will update the documentation - now it feels you care. I appreciate that.

I will update the OP.

laikhtewari commented 1 week ago

Oops copied the wrong username, thanks @jinxiangshi !

Superjomn commented 1 week ago

Sorry by closing the issue, we will amend the dependency requirements and update the document for Llama 3 and Llama 3.1. @stas00

stas00 commented 1 week ago

Thank you, @Superjomn