NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.
https://nvidia.github.io/TensorRT-LLM
Apache License 2.0
8.68k stars 991 forks source link

When I used convert_checkpoint.py to convert Gemma hf format, It print killed #2344

Open imilli opened 1 month ago

imilli commented 1 month ago

System Info CPU architecture ( x86_64)

CPU/Host memory size (64GB)

GPU properties

GPU name ( NVIDIA RTX4090) GPU memory size (24GB) Libraries

TensorRT-LLM branch or tag (v0.13.0) Versions of TensorRT CUDA Container used (nvcr.io/nvidia/tritonserver:24.09-trtllm-python-py3 ) NVIDIA driver version 12.6

OS (Windows 11 Pro)

Who can help? No response

Information

The official example scripts

My own modified scripts Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)

My own task or dataset (give details below) Reproduction

1.docker run --rm -it --net host --shm-size=8g --memory="64g" --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v d:/llm/tensorrtllm_backend:/tensorrtllm_backend -v d:/llm/engines:/engines nvcr.io/nvidia/tritonserver:24.09-trtllm-python-py3

2.run docker and excute pip show tensorrt_llm: Name: tensorrt-llm Version: 0.13.0 Summary: TensorRT-LLM: A TensorRT Toolbox for Large Language Models Home-page: https://github.com/NVIDIA/TensorRT-LLM Author: NVIDIA Corporation Author-email: License: Apache License 2.0 Location: /usr/local/lib/python3.10/dist-packages Requires: accelerate, aenum, build, click, click-option-group, colored, cuda-python, diffusers, evaluate, h5py, lark, mpi4py, mpmath, numpy, nvidia-modelopt, onnx, openai, optimum, pandas, pillow, polygraphy, psutil, pulp, pynvml, sentencepiece, StrEnum, tensorrt, torch, transformers, wheel Required-by:

3.git clone https://github.com/NVIDIA/TensorRT-LLM.git (branch main) root@docker-desktop:/tensorrtllm_backend/src/TensorRT-LLM/examples/gemma# python3 convert_checkpoint.py --model-dir "/tensorrtllm_backend/models/gemma-2-9b-chat" --output-model-dir "/tensorrtllm_backend/trt-model" --dtype float16 --ckpt-type hf --world-size 1 --use-weight-only-with-precision int8 --load_model_on_cpu [TensorRT-LLM] TensorRT-LLM version: 0.13.0 You are using a model of type gemma2 to instantiate a model of type gemma. This is not supported for all configurations of models and can yield errors. Determined TensorRT-LLM configuration {'architecture': 'Gemma2ForCausalLM', 'dtype': 'float16', 'vocab_size': 256000, 'hidden_size': 3584, 'num_hidden_layers': 42, 'num_attention_heads': 16, 'hidden_act': 'gelu_pytorch_tanh', 'logits_dtype': 'float32', 'norm_epsilon': 1e-06, 'position_embedding_type': 'rope_gpt_neox', 'max_position_embeddings': 8192, 'num_key_value_heads': 8, 'intermediate_size': 14336, 'mapping': {'world_size': 1, 'gpus_per_node': 8, 'cp_size': 1, 'tp_size': 1, 'pp_size': 1, 'moe_tp_size': 1, 'moe_ep_size': 1}, 'quantization': {'quant_algo': <QuantAlgo.W8A16: 'W8A16'>, 'kv_cache_quant_algo': None, 'group_size': 128, 'smoothquant_val': None, 'clamp_val': None, 'has_zero_point': False, 'pre_quant_scale': True, 'exclude_modules': None}, 'use_parallel_embedding': False, 'embedding_sharding_dim': 0, 'share_embedding_table': True, 'head_size': 256, 'qk_layernorm': False, 'rotary_base': 10000.0, 'attn_bias': False, 'mlp_bias': False, 'rotary_scaling': None, 'inter_layernorms': True, 'query_pre_attn_scalar': 224, 'final_logit_softcapping': 30.0, 'attn_logit_softcapping': 50.0} Loading weights... Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.70it/s] Killed root@docker-desktop:/tensorrtllm_backend/src/TensorRT-LLM/examples/gemma#

additional notes My computer has a lot of free cpu memory. but the command prompt killed, no other informations. Image

github-actions[bot] commented 2 days ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."