Open Shaquille-Wu opened 3 months ago
Can you upload build.log by trtexec --onnx=spec --verbose --plugins=spec 2>&1 |tee build.log
?
Can you upload build.log by
trtexec --onnx=spec --verbose --plugins=spec 2>&1 |tee build.log
?
thanks for you help, I've upload the log:
check them please
Checking.
please use follow cmd to add other info
trtexec --onnx=spec.onnx --verbose --saveEngine=spec.plan \
--dumpProfile --dumpLayerInfo --separateProfileRun \
--noDataTransfers --useCudaGraph --useSpinWait | tee log
Checking.
please use follow cmd to add other info
trtexec --onnx=spec.onnx --verbose --saveEngine=spec.plan \ --dumpProfile --dumpLayerInfo --separateProfileRun \ --noDataTransfers --useCudaGraph --useSpinWait | tee log
thanks for your checking I re-genrate the log, check it please:
Because you only impl part ops by plugin, it break the fusion, but trt nativate build those ops in some foreign nodes, which better than your plugin + native ops.
Obviously, there are a large number of unfused layers on the left side.
Because you only impl part ops by plugin, it break the fusion, but trt nativate build those ops in some foreign nodes, which better than your plugin + native ops.
How can I enable those "fusion", if I add a custom op? you mean, I should add my custom op into trt source code, and recompile trt ? would you like to tell me furthermore details? for example, what is foreign node? how to add custom op into foreign nodes?
You can first use onnx-simplifier or polygraphy tools to optimize your onnx, then try to expand the scope of custom plugins, just compile a use custom plugin so. Like follow samples,
https://github.com/NVIDIA/TensorRT/tree/release/10.2/plugin can build a lib.
I've executed onnx-simplifier before onnx2trt, I added my custom op into onnx after onnx-simplifier and before onnx2trt. So, I think my onnx graph is a simplified graph. I didn't find outstanding difference between my custom op plugin and the trt's official plugin I still cannot understand why the trt's official plugin can enable the "fusion", why? you mean, I must add my custom op plugin into trt's source code, an recompile it?
I still cannot understand why the trt's official plugin can enable the "fusion", why?
trt native build by myelin.
you mean, I must add my custom op plugin into trt's source code, an recompile it?
You can build(compile) a custom_plugin.so follow trt oss sample.
Hi, TRT experts:
I have a custom op which is not supported by tensorrt so, I add it as a plugin into tensorrt I found the whole cost time is improve about 10ms my test as following:
int MyPluginDynamic::enqueue(const nvinfer1::PluginTensorDesc* inputDesc, const nvinfer1::PluginTensorDesc* outputDesc, const void* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) TRT_NOEXCEPT { return 0; //return directly }
I don’t know why trt’s performance is poor after I adding a little custom op, I guess: