Closed Hongbosherlock closed 1 week ago
Hi
please try pip install tensorrt_llm==0.11.0.dev2024061800
and use the container: nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
Or build the TRT-LLM container using make -C docker release_build
(recommend)
It works well on my side:
| NVIDIA-SMI 550.54.14 Driver Version: 550.54.14 CUDA Version: 12.4 |
|-----------------------------------------+------------------------+----------------------+
| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
|=========================================+========================+======================|
| 0 NVIDIA L40S On | 00000000:01:00.0 Off | 0 |
| N/A 26C P8 32W / 350W | 0MiB / 46068MiB | 0% Default |
| | | N/A
jianh@cd56de89319f:/tensorrtllm$ ./cpp/build/benchmarks/gptSessionBenchmark --engine_dir ./tmp/llama3-8b-awq-engine/ --batch_size "1" --input_output_len "60,20"
Benchmarking done. Iteration: 10, duration: 1.50 sec.
Latencies: [149.56, 149.60, 149.64, 149.48, 151.35, 149.47, 149.45, 149.42, 149.47, 149.59]
[BENCHMARK] batch_size 1 input_length 60 output_length 20 latency(ms) 149.70 tokensPerSec 133.60 generation_time(ms) 139.96 generationTokensPerSec 142.89 gpu_peak_mem(gb) 43.61
Thanks
please try
pip install tensorrt_llm==0.11.0.dev2024061800
and use the container:nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
How can I reproduce your result? just run this docker and pip install tensorrt_llm
is not enough before I could successfully build a engine.
@hijkzzz It didn't work for me, did I miss anything? Please take a look for me. Thanks!
docker run -it --shm-size 200G --gpus all --network=host --cap-add=SYS_ADMIN --name nv_fp8 -v ${PWD}:/target nvcr.io/nvidia/tritonserver:24.05-trtllm-python-py3
pip install tensorrt_llm==0.11.0.dev2024061800
sh install_tensorrt.sh
python3 -c "import tensorrt_llm"
I got errors:
python3 -c "import tensorrt_llm"
Traceback (most recent call last):
File "/usr/local/lib/python3.10/dist-packages/transformers/utils/import_utils.py", line 1535, in _get_module
return importlib.import_module("." + module_name, self.__name__)
File "/usr/lib/python3.10/importlib/__init__.py", line 126, in import_module
return _bootstrap._gcd_import(name[level:], package, level)
File "<frozen importlib._bootstrap>", line 1050, in _gcd_import
File "<frozen importlib._bootstrap>", line 1027, in _find_and_load
File "<frozen importlib._bootstrap>", line 1006, in _find_and_load_unlocked
File "<frozen importlib._bootstrap>", line 688, in _load_unlocked
File "<frozen importlib._bootstrap_external>", line 883, in exec_module
File "<frozen importlib._bootstrap>", line 241, in _call_with_frames_removed
File "/usr/local/lib/python3.10/dist-packages/transformers/generation/utils.py", line 97, in <module>
from accelerate.hooks import AlignDevicesHook, add_hook_to_module
File "/usr/local/lib/python3.10/dist-packages/accelerate/__init__.py", line 16, in <module>
from .accelerator import Accelerator
File "/usr/local/lib/python3.10/dist-packages/accelerate/accelerator.py", line 35, in <module>
from .checkpointing import load_accelerator_state, load_custom_state, save_accelerator_state, save_custom_state
File "/usr/local/lib/python3.10/dist-packages/accelerate/checkpointing.py", line 24, in <module>
from .utils import (
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/__init__.py", line 182, in <module>
from .bnb import has_4bit_bnb_layers, load_and_quantize_model
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/bnb.py", line 29, in <module>
from ..big_modeling import dispatch_model, init_empty_weights
File "/usr/local/lib/python3.10/dist-packages/accelerate/big_modeling.py", line 24, in <module>
from .hooks import (
File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 30, in <module>
from .utils.other import recursive_getattr
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/other.py", line 36, in <module>
from .transformer_engine import convert_model
File "/usr/local/lib/python3.10/dist-packages/accelerate/utils/transformer_engine.py", line 21, in <module>
import transformer_engine.pytorch as te
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/__init__.py", line 6, in <module>
from .module import LayerNormLinear
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/__init__.py", line 6, in <module>
from .layernorm_linear import LayerNormLinear
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/module/layernorm_linear.py", line 13, in <module>
from .. import cpp_extensions as tex
File "/usr/local/lib/python3.10/dist-packages/transformer_engine/pytorch/cpp_extensions/__init__.py", line 6, in <module>
from transformer_engine_extensions import *
ImportError: /usr/local/lib/python3.10/dist-packages/transformer_engine_extensions.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN5torch3jit17parseSchemaOrNameERKNSt7__cxx1112basic_stringIcSt11char_traitsIcESaIcEEE
tried to uninstall transformer_engine
according to https://github.com/chenfei-wu/TaskMatrix/issues/116
but also got errors:
Traceback (most recent call last):
File "<string>", line 1, in <module>
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/__init__.py", line 33, in <module>
import tensorrt_llm.models as models
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/__init__.py", line 34, in <module>
from .llama.model import LLaMAForCausalLM, LLaMAModel
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/model.py", line 32, in <module>
from .convert import (load_hf_llama, load_weights_from_hf_by_shard,
File "/usr/local/lib/python3.10/dist-packages/tensorrt_llm/models/llama/convert.py", line 31, in <module>
from transformers.models.llama.modeling_llama import LlamaDecoderLayer
File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 54, in <module>
from flash_attn import flash_attn_func, flash_attn_varlen_func
File "/usr/local/lib/python3.10/dist-packages/flash_attn/__init__.py", line 3, in <module>
from flash_attn.flash_attn_interface import (
File "/usr/local/lib/python3.10/dist-packages/flash_attn/flash_attn_interface.py", line 10, in <module>
import flash_attn_2_cuda as flash_attn_cuda
ImportError: /usr/local/lib/python3.10/dist-packages/flash_attn_2_cuda.cpython-310-x86_64-linux-gnu.so: undefined symbol: _ZN3c104cuda9SetDeviceEi
Hi try make -C docker release_build
System Info
ubuntu 20.04 tensorrt 10.0.1 tensorrt-cu12 10.0.1 tensorrt-cu12-bindings 10.0.1 tensorrt-cu12-libs 10.0.1 tensorrt-llm 0.11.0.dev2024061100
nvidia L40s
Who can help?
@byshiue @hijkzzz
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
2.trtllm build
3.benchmark
Expected behavior
get benchmark result
actual behavior
got errors:
additional notes
run
result:
It seems that the engine and docker env are ok.