Open BasicCoder opened 8 months ago
Could you try the latest main branch? The issue is fixed in latest main branch by preventing using shared memory in penalty kernel.
Could you try the latest main branch? The issue is fixed in latest main branch by preventing using shared memory in penalty kernel.
Thanks for your help. Has this issue been fixed in TRT-LLM v0.8.0 version?
Yes, the issue is also fixed in v0.8.0.
System Info
CPU: X86_64 GPU: 4*A100 80G TensorRT-LLM: 0.6.1
Who can help?
@kaiyux @byshiue
Information
Tasks
examples
folder (such as GLUE/SQuAD, ...)Reproduction
Using TRT-LLM v0.6.1 version.
Expected behavior
Get the correct return result.
actual behavior
additional notes
I checked the code, and the same code is included in versions 0.6.1~0.7.1 https://github.com/NVIDIA/TensorRT-LLM/blob/v0.6.1/cpp/tensorrt_llm/kernels/samplingPenaltyKernels.cu#L271 .
This error only occurs when
repetition_penalty=1.1,
and there is no error whenrepetition_penalty=1.0.
This may be because the input length that the model needs to process is 100k, which exceeds smemSize. How can I get around this length limitation?