Open jiahy0825 opened 1 month ago
TensorRT-LLM (Language Model) does support custom layer plugins, but the Python API for custom layers can sometimes lead to issues like the segmentation fault you're experiencing. This is often due to the fact that Python-based custom layers are less commonly used in production environments, with C++ or shared object (*.so) plugins being the preferred approach due to performance optimizations and memory management.
Here's a general debugging guide and example for using a Python custom layer in TensorRT:
import tensorrt as trt
import numpy as np
##Define a custom ReLU layer in Python
class CustomReLU(trt.IPluginV2DynamicExt):
def __init__(self):
super(CustomReLU, self).__init__()
def get_plugin_type(self):
return "CustomReLU"
def get_plugin_version(self):
return "1"
def get_output_dimensions(self, input_dims, index, num_inputs):
return input_dims # Return the same shape as input
def enqueue(self, batch_size, inputs, outputs, workspace, stream):
input_data = inputs[0]
output_data = outputs[0]
np.maximum(input_data, 0, out=output_data)
return 0 # Return 0 for success
##Register the custom plugin
trt.register_plugin_creator(CustomReLU(), "CustomReLU")
@jiahy0825 this is on our roadmap, and likely TRT-LLM v0.15 will have it.
The segment fault is fixed in TRT 10.4.
The segment fault is fixed in TRT 10.4.
Thank you for your response. I’ve resolved the segmentation fault issue. There is something wrong about my usage of python plugin, but unfortunately, I didn’t receive any warnings or error messages, which makes it difficult to figure out what went wrong.
Do you have any suggestions on how to effectively debug Python plugins?
I can find C++ custom layer plugin and Triton plugin support in TensorRT-LLM, both these two methods need C++ codes or *.so file.
As TensorRT already supports python custom layer(https://docs.nvidia.com/deeplearning/tensorrt/developer-guide/index.html#add_custom_layer_python), does TensorRT-LLM supports this feature?
I have tried to register python custom layer plugin. It can successfully build the engine. However, when I deserialized this engine, Segment Fault occurs and I don't know how to debug.
Can you provide examples about inference with python custom layer plugin?