How to deal with unsupported operations in TensorRT-LLM?

NVIDIA / TensorRT-LLM

TensorRT-LLM provides users with an easy-to-use Python API to define Large Language Models (LLMs) and build TensorRT engines that contain state-of-the-art optimizations to perform inference efficiently on NVIDIA GPUs. TensorRT-LLM also contains components to create Python and C++ runtimes that execute those TensorRT engines.

https://nvidia.github.io/TensorRT-LLM

Apache License 2.0

8.77k stars 1k forks source link

How to deal with unsupported operations in TensorRT-LLM? #2218

Open EpiSlim opened 2 months ago

EpiSlim commented 2 months ago

Hi team, I am trying to add a custom model to TRT-LLM. The original model in PyTorch uses torch.complex for few tensors, in addition to FFT ops like torch.fft.fft, torch.fft.ifft and torch.fft.rfft. From the documentation, it seems that such ops are not directly supported in TRT-LLM. Is there a workaround around this? For example, it is possible to mix TRT-LLM modules with PyTorch modules?

litaotju commented 1 month ago

@EpiSlim currently mixing torch and TRT ops are not implemented.

Although, it would be possible if you can wrap the torch operations into TRT plugin by using the TRT python plugin interface.

Would you mind to share you model with more context such that this feature can be considered in future planning?

github-actions[bot] commented 4 weeks ago

This issue is stale because it has been open 30 days with no activity. Remove stale label or comment or this will be closed in 15 days."