Closed muaydin closed 5 months ago
fixed with
export LD_LIBRARY_PATH=/usr/lib/x86_64-linux-gnu:$LD_LIBRARY_PATH
but now I am getting TensorRT-LLM not supported:
python3 run_server.py --port 9090 --backend tensorrt --trt_model_path "/root/TensorRT-LLM-examples/whisper/whisper_small"
[05/06/2024-09:07:24] TensorRT-LLM not supported: [TensorRT-LLM][ERROR] CUDA runtime error in cub::DeviceSegmentedRadixSort::SortPairsDescending(nullptr, cubTempStorageSize, logProbs, (T*) nullptr, idVals, (int*) nullptr, vocabSize * batchSize, batchSize, beginOffsetBuf, offsetBuf + 1, 0, sizeof(T) * 8, stream): no kernel image is available for execution on the device (/root/TensorRT-LLM/cpp/tensorrt_llm/kernels/samplingTopPKernels.cu:322)
1 0x7f4b9c74b825 void tensorrt_llm::common::check<cudaError>(cudaError, char const*, char const*, int) + 149
2 0x7f4b9c837858 void tensorrt_llm::kernels::invokeBatchTopPSampling<__half>(void*, unsigned long&, unsigned long&, int**, int*, tensorrt_llm::kernels::FinishedState const*, tensorrt_llm::kernels::FinishedState*, float*, float*, __half const*, int const*, int*, int*, curandStateXORWOW*, int, unsigned long, int const*, float, float const*, CUstream_st*, bool const*) + 2200
_no kernel image is available for execution on the device (/root/TensorRT-LLM/cpp/tensorrtllm/kernels/samplingTopPKernels.cu:322)
if I try to build TensorRT-LLM container manually eventually I got
python3 run_server.py --port 9090 --backend tensorrt --trt_model_path "/app/tensorrt_llm/examples/whisper/whisper_small"
[TensorRT-LLM] TensorRT-LLM version: 0.10.0.dev2024050700
[05/08/2024-09:19:41] TensorRT-LLM not supported: Trying to create tensor with negative dimension -1: [-1, 1500, 768]
GPU: Tesla T4,
I built TensorRT-LLM with
make -C docker release_build CUDA_ARCHS="75"
Note: and it throws exception
[05/08/2024-09:11:46] TensorRT-LLM not supported: ModelConfig.__init__() missing 2 required positional arguments: 'max_batch_size' and 'max_beam_width'
If fixed it by adding
decoder_model_config = ModelConfig(
max_batch_size=self.decoder_config['max_batch_size'],
max_beam_width=self.decoder_config['max_beam_width'],
...
Thanks for reporting and tracking the issue, we are looking into this at our end as well.
I also ran into those issues.
When you stick to TensorRT LLM 0.7.1, you neither get model config error (I applied the same fix as you), nor the negative dimension error (I didn't have the time to look deeper into that).
I have a working build in #221, feel free to give it a try.
Closed by #227
here is my nvidia-smi result
python -c "import torch; import tensorrt; import tensorrt_llm"
working wellWhen a client is connected server is getting core dumped related to libcudnn_cnn_infer library. here is the related part of the log
what could be the reason ?
my ubutu version is
your docker image ubuntu version is (running on 20.04)
can it be related to Ubuntu 22.04?