NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.03k stars 885 forks source link

Can't convert-checkpoint Mistral 7B v0.3: safetensors_rust.SafetensorError: File does not contain tensor model.embed_tokens.weight #1732

Open Ace-RR opened 2 months ago

Ace-RR commented 2 months ago

System Info

on H100 Nvidia

Who can help?

No response

Information

Tasks

Reproduction

1.git -C /workspace clone https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

2.python tensorrt_llm/examples/llama/convert_checkpoint.py --model_dir /workspace/Mistral-7B-Instruct-v0.3 --output_dir /workspace/trt_ckpt/mistral3/fp16 --dtype bfloat16

Expected behavior

0.11.0.dev2024060400 Total time of converting checkpoints: xx:xx:xx

actual behavior

[TensorRT-LLM] TensorRT-LLM version: 0.11.0.dev2024060400 0.11.0.dev2024060400 Traceback (most recent call last): File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 439, in main() File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 431, in main convert_and_save_hf(args) File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 366, in convert_and_save_hf execute(args.workers, [convert_and_save_rank] * world_size, args) File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 390, in execute f(args, rank) File "/app/tensorrt_llm/examples/llama/convert_checkpoint.py", line 355, in convert_and_save_rank llama = LLaMAForCausalLM.from_hugging_face( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 292, in from_hugging_face weights = load_weights_from_hf_safetensors(hf_model_dir, config) File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1577, in load_weights_from_hf_safetensors weights['transformer.vocab_embedding.weight'] = load( File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 1555, in load res = safetensors_ptrs[ptr_idx].get_tensor(key) safetensors_rust.SafetensorError: File does not contain tensor model.embed_tokens.weight

additional notes

Mistral 7B v0.3 requires transformers 4.42.0.dev0

version of transformers with tensorrtllm_backend: 4.40.2

But the command doesn't work even if transformers is upgrade to 4.42.0.dev0

hijkzzz commented 2 months ago

@nv-guomingz This is a new feature due to changes in the weight format and tokenizer of mistral v0.3

nv-guomingz commented 2 months ago

hi @Ace-RR , a quick war is to delete the consolidated.safetensors from you local model directory. Or you can apply below changes on your side.

image
nv-guomingz commented 2 months ago

Hi @Ace-RR we've merged the fixing into internal repo and please fetch the next weekly update for your further work.

github-actions[bot] commented 1 month ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."