When I used convert_checkpoint.py to convert Gemma hf format, It print killed

System Info CPU architecture ( x86_64)

CPU/Host memory size (64GB)

GPU properties

GPU name ( NVIDIA RTX4090) GPU memory size (24GB) Libraries

TensorRT-LLM branch or tag (v0.13.0) Versions of TensorRT CUDA Container used (nvcr.io/nvidia/tritonserver:24.09-trtllm-python-py3 ) NVIDIA driver version 12.6

OS (Windows 11 Pro)

Who can help? No response

Information

The official example scripts

My own modified scripts Tasks

An officially supported task in the examples folder (such as GLUE/SQuAD, ...)

My own task or dataset (give details below) Reproduction

1.docker run --rm -it --net host --shm-size=8g --memory="64g" --ulimit memlock=-1 --ulimit stack=67108864 --gpus all -v d:/llm/tensorrtllm_backend:/tensorrtllm_backend -v d:/llm/engines:/engines nvcr.io/nvidia/tritonserver:24.09-trtllm-python-py3

2.run docker and excute pip show tensorrt_llm: Name: tensorrt-llm Version: 0.13.0 Summary: TensorRT-LLM: A TensorRT Toolbox for Large Language Models Home-page: https://github.com/NVIDIA/TensorRT-LLM Author: NVIDIA Corporation Author-email: License: Apache License 2.0 Location: /usr/local/lib/python3.10/dist-packages Requires: accelerate, aenum, build, click, click-option-group, colored, cuda-python, diffusers, evaluate, h5py, lark, mpi4py, mpmath, numpy, nvidia-modelopt, onnx, openai, optimum, pandas, pillow, polygraphy, psutil, pulp, pynvml, sentencepiece, StrEnum, tensorrt, torch, transformers, wheel Required-by:

3.git clone https://github.com/NVIDIA/TensorRT-LLM.git (branch main) root@docker-desktop:/tensorrtllm_backend/src/TensorRT-LLM/examples/gemma# python3 convert_checkpoint.py --model-dir "/tensorrtllm_backend/models/gemma-2-9b-chat" --output-model-dir "/tensorrtllm_backend/trt-model" --dtype float16 --ckpt-type hf --world-size 1 --use-weight-only-with-precision int8 --load_model_on_cpu [TensorRT-LLM] TensorRT-LLM version: 0.13.0 You are using a model of type gemma2 to instantiate a model of type gemma. This is not supported for all configurations of models and can yield errors. Determined TensorRT-LLM configuration {'architecture': 'Gemma2ForCausalLM', 'dtype': 'float16', 'vocab_size': 256000, 'hidden_size': 3584, 'num_hidden_layers': 42, 'num_attention_heads': 16, 'hidden_act': 'gelu_pytorch_tanh', 'logits_dtype': 'float32', 'norm_epsilon': 1e-06, 'position_embedding_type': 'rope_gpt_neox', 'max_position_embeddings': 8192, 'num_key_value_heads': 8, 'intermediate_size': 14336, 'mapping': {'world_size': 1, 'gpus_per_node': 8, 'cp_size': 1, 'tp_size': 1, 'pp_size': 1, 'moe_tp_size': 1, 'moe_ep_size': 1}, 'quantization': {'quant_algo': <QuantAlgo.W8A16: 'W8A16'>, 'kv_cache_quant_algo': None, 'group_size': 128, 'smoothquant_val': None, 'clamp_val': None, 'has_zero_point': False, 'pre_quant_scale': True, 'exclude_modules': None}, 'use_parallel_embedding': False, 'embedding_sharding_dim': 0, 'share_embedding_table': True, 'head_size': 256, 'qk_layernorm': False, 'rotary_base': 10000.0, 'attn_bias': False, 'mlp_bias': False, 'rotary_scaling': None, 'inter_layernorms': True, 'query_pre_attn_scalar': 224, 'final_logit_softcapping': 30.0, 'attn_logit_softcapping': 50.0} Loading weights... Loading checkpoint shards: 100%|████████████████████████████████████████████████████████████████████████████| 4/4 [00:01<00:00, 2.70it/s] Killed root@docker-desktop:/tensorrtllm_backend/src/TensorRT-LLM/examples/gemma#

additional notes My computer has a lot of free cpu memory. but the command prompt killed, no other informations.

NVIDIA / TensorRT-LLM

When I used convert_checkpoint.py to convert Gemma hf format, It print killed #2344