NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

Apache License 2.0

8.61k stars 979 forks source link

Single GPU on BLOOM 560M

python convert_checkpoint.py --model_dir ./bloom/560M/ \ --dtype float16 \ --output_dir ./bloom/560M/trt_ckpt/fp16/1-gpu/

TensorRT-LLM/examples/bloom# python3 convert_checkpoint.py --model_dir ./bloom/560M/ \
                --dtype float16 \
                --output_dir ./bloom/560M/trt_ckpt/fp16/1-gpu/
Running /TensorRT-LLM/examples/bloom/convert_checkpoint.py
Traceback (most recent call last):
  File "/TensorRT-LLM/examples/bloom/convert_checkpoint.py", line 26, in <module>
    from tensorrt_llm.models.llama.utils import iterate_shard_files, load_state_dict  #TODO: move the utils to common dir shared by models
ModuleNotFoundError: No module named 'tensorrt_llm.models.llama.utils'

### Tasks

NVIDIA / TensorRT-LLM

BLOOM 560M model build failing when following the README Steps #942

Single GPU on BLOOM 560M