NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.15k stars 898 forks source link

does the chatglm2-6B example support codegeex2-6b's building and running? #93

Closed thendwk closed 10 months ago

thendwk commented 10 months ago

i tried codegeex2-6b's building and running with chatglm2-6B example, but it resulted incorrectly. the result listed as follows: root@***:/app/tensorrt_llm/examples/chatglm2-6b# python3 run.py --input_text '# language: Python\n# write a bubble sort function\n' [10/24/2023-08:24:41] [TRT] [I] Loaded engine size: 11921 MiB [10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 12584, GPU 12164 (MiB) [10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +2, GPU +10, now: CPU 12586, GPU 12174 (MiB) [10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in engine deserialization: CPU +0, GPU +11912, now: CPU 0, GPU 11912 (MiB) [10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 12585, GPU 15910 (MiB) [10/24/2023-08:24:43] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +1, GPU +8, now: CPU 12586, GPU 15918 (MiB) [10/24/2023-08:24:44] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 11912 (MiB) [10/24/2023-08:24:44] [TRT] [I] [MemUsageChange] Init cuBLAS/cuBLASLt: CPU +0, GPU +8, now: CPU 12680, GPU 15940 (MiB) [10/24/2023-08:24:44] [TRT] [I] [MemUsageChange] Init cuDNN: CPU +0, GPU +10, now: CPU 12680, GPU 15950 (MiB) [10/24/2023-08:24:44] [TRT] [I] [MemUsageChange] TensorRT-managed allocation in IExecutionContext creation: CPU +0, GPU +0, now: CPU 0, GPU 11912 (MiB)


Input --->

language: Python\n# write a bubble sort function\n

Output ---> \n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\nnumbers[j],numbers[j+1]=numbers[j+1],numbers[j]\nreturnnumbers\n\nprint(bubble_sort([5,2,1,8,4]))\n\n#bubblesort\n#takesalistofnumbers\n#returnsasortedlist\n\ndefbubble_sort(numbers):\nforiinrange(len(numbers)):\nforjinrange(len(numbers)-1):\nifnumbers[j]>numbers[j+1]:\n


Finished!

byshiue commented 10 months ago

Can you share the expected results and the steps you use to generate the results of TRT-LLM?

thendwk commented 10 months ago

Can you share the expected results and the steps you use to generate the results of TRT-LLM?

expected result is something like this: `def bubble_sort(list): for i in range(len(list) - 1): for j in range(len(list) - 1): if list[j] > list[j + 1]: list[j], list[j + 1] = list[j + 1], list[j] return list

print(bubble_sort([5, 2, 1, 8, 4]))`

my steps as follows: 1、git clone model:git clone https://huggingface.co/THUDM/codegeex2-6b 2、run build.py python3 build.py --model_dir=/docker_storage/codegeex/codegeex2-6b/ \ --dtype float16 \ --use_gpt_attention_plugin float16 \ --use_gemm_plugin float16 \ --max_input_len 2048 \ --max_output_len 1024 doing above throws exception,i modified build.py source code as follows: hf_model = transformers.AutoModel.from_pretrained( args.model_dir, trust_remote_code=True, torch_dtype=torch.float16).cpu() maybe codogeex2-6b is trained in bf16 3、run prediction python3 run.py --input_text '# language: Python\n# write a bubble sort function\n'

byshiue commented 10 months ago

The TRT-LLM results you shared at the beginning is not very different to the HF's result. Can you try building the engine with FP32 (because chatglm2 does not support BF16 now) first?

thendwk commented 10 months ago

The TRT-LLM results you shared at the beginning is not very different to the HF's result. Can you try building the engine with FP32 (because chatglm2 does not support BF16 now) first?

build in default mode, it throws exception as follows: image does this indicate that codegeex2-6B is trained in bf16 and TensorRT-LLM does not support build codegeex2-6B?

byshiue commented 10 months ago

The TRT-LLM results you shared at the beginning is not very different to the HF's result. Can you try building the engine with FP32 (because chatglm2 does not support BF16 now) first?

build in default mode, it throws exception as follows: image does this indicate that codegeex2-6B is trained in bf16 and TensorRT-LLM does not support build codegeex2-6B?

For BF16 weight, TensorRT-LLM chatglm2 does not support now. It requires to add some flags and data type converter. So, that's why I mention you can try FP32 first.

thendwk commented 10 months ago

The TRT-LLM results you shared at the beginning is not very different to the HF's result. Can you try building the engine with FP32 (because chatglm2 does not support BF16 now) first?

build in default mode, it throws exception as follows: image does this indicate that codegeex2-6B is trained in bf16 and TensorRT-LLM does not support build codegeex2-6B?

For BF16 weight, TensorRT-LLM chatglm2 does not support now. It requires to add some flags and data type converter. So, that's why I mention you can try FP32 first.

ok,i got,thx

byshiue commented 10 months ago

hf_model = transformers.AutoModel.from_pretrained( args.model_dir, trust_remote_code=True, torch_dtype=torch.float16).cpu()

I take a try and find that chatglm2 only supports FP16 now. So, you cannot run it on FP32. We will fix it soon.

thendwk commented 10 months ago

hf_model = transformers.AutoModel.from_pretrained( args.model_dir, trust_remote_code=True, torch_dtype=torch.float16).cpu()

I take a try and find that chatglm2 only supports FP16 now. So, you cannot run it on FP32. We will fix it soon.

okkk thx

byshiue commented 10 months ago

This issue is fixed by this MR https://github.com/NVIDIA/TensorRT-LLM/pull/148, you can try on latest main branch. Close this bug. Feel free to reopen if needed.