Open amir1m opened 10 months ago
Hi, could you give a script to reproduce the model in "./merged/"? The training step 3 can be skipped.
Hi, could you give a script to reproduce the model in "./merged/"? The training step 3 can be skipped.
Hi @syuoni , Thanks for your response! I am unable to share the model. However, I meant to say in step#3 that we fine-tuned the model.
I again tried today and still the same result.
Hi @amir1m , sorry that it seems my question was a little unclear. I mean that you may give a script for steps 1, 2 and 4, which should produce a model in "./merged/". So that I can quickly reproduce the issue you encountered.
You don't need to share the model, so I saied "the finetuning step 3 can be skipped" (as it relates to your "custom dataset"). But the model in "./merged/" should have exactly the same structures with the model you have.
I see. The base model in "./merged/" is quantized.
You can reload falcon-7b in bf16 precision, merge the lora weights, and then save to "./merged-bf16/". Then, run
python3 convert_checkpoint.py --model_dir ./merged-bf16/ --dtype bfloat16 --output_dir ./trt_ckpt/bf16/1-gpu/
hi do u still have further issue or question now? If not, we'll close it soon.
System Info
CPU:X_86_64 GPU: A10 OS: Ubuntu 22.04
Who can help?
@Tracin @byshiue please help.
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
I have PEFT fine tuned Falcon instruct model as
Then get the PEFT model and fine tune on GPU server:
python3 convert_checkpoint.py --model_dir ./merged/ --dtype bfloat16 --output_dir ./trt_ckpt/bf16/1-gpu/
0.7.1 [01/17/2024-13:10:40] WARNING: You are currently loading Falcon using legacy code contained in the model repository. Falcon has now been fully ported into the Hugging Face transformers library. For the most up-to-date and high-performance version of the Falcon model code, please update to the latest version of transformers and then load the model without the trust_remote_code=True argument.
Loading checkpoint shards: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:01<00:00, 1.98it/s] Traceback (most recent call last): File "/code/tensorrt_llm/examples/falcon/convert_checkpoint.py", line 1120, in
covert_and_save(rank)
File "/code/tensorrt_llm/examples/falcon/convert_checkpoint.py", line 1097, in covert_and_save
weights = convert_hf_falcon(
File "/code/tensorrt_llm/examples/falcon/convert_checkpoint.py", line 379, in convert_hf_falcon
qkv_w = split_qkv_weight(qkv_weight,
File "/code/tensorrt_llm/examples/falcon/convert_checkpoint.py", line 265, in split_qkv_weight
weight = reorder_qkv_weight_or_bias(weight,
File "/code/tensorrt_llm/examples/falcon/convert_checkpoint.py", line 214, in reorder_qkv_weight_or_bias
assert weight.shape[0] == num_kv_heads num_group_heads head_dim, \
AssertionError: 4672 != 71 3 64
root@myhostname-release:/code/tensorrt_llm/examples/falcon# trtllm-build --checkpoint_dir ./trt_ckpt/bf16/1-gpu/ --use_gemm_plugin bfloat16 --remove_input_padding --use_gpt_attention_plugin bfloat16 --enable_context_fmha --output_dir ./trt_engines/bf16/1-gpu/ [01/17/2024-13:29:18] [TRT-LLM] [I] Context FMHA Enabled [01/17/2024-13:29:18] [TRT-LLM] [I] Remove Padding Enabled Traceback (most recent call last): File "/usr/local/bin/trtllm-build", line 8, in
sys.exit(main())
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 354, in main
parallel_build(source, build_config, args.output_dir, workers,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 284, in parallel_build
build_and_save(rank, rank % workers, ckpt_dir, build_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 258, in build_and_save
engine = build(build_config,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/commands/build.py", line 243, in build
model = model_cls.from_checkpoint(ckpt_dir,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 347, in from_checkpoint
model.load(weights)
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/modeling_utils.py", line 363, in load
param.value = weights[name]
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/parameter.py", line 113, in value
assert v.shape == self._shape, \
AssertionError: The value updated is not the same shape as the original. Updated: (4672, 4544), original: (13632, 4544)
assert weight.shape[0] == num_kv_heads num_group_heads head_dim, \ AssertionError: 4672 != 71 3 64