Closed demuxin closed 2 months ago
How about use follow function to timing ?
std::chrono::high_resolution_clock::now();
Hi @lix19937 , The results are the same, it shouldn't be a problem with the timing.
Do you have any other suggestions?
And I can offer plugin code.
NmsdetaIPluginV2DynamicExt.h.txt NmsdetaIPluginV2DynamicExt.cpp.txt
From your plugin.cpp,
Thanks @lix19937 , I know CUDAStreamSynchronize is not necessary, I just want to metric time-consuming of this plugin.
According to your statement, the 165ms is actually the elapsed time of the node before NmsDeta, right?
And how to measure the time from net-in node to NmsDeta node, or how to measure the elapsed time of every node of model?
Thank you again for your prompt reply.
Use follow code replace your enqueue impl.
int32_t NmsdetaIPluginV2DynamicExt::enqueue(PluginTensorDesc const* inputDesc, PluginTensorDesc const* outputDesc, void const* const* inputs, void* const* outputs, void* workspace, cudaStream_t stream) IS_NOEXCEPT{
return 0;
}
then use trt cmd , and upload the build.log.
./trtexec --onnx=$ONNX_filename \
--saveEngine=$ONNX_filename.plan \
--verbose \
--dumpProfile \
--noDataTransfers \
--useCudaGraph \
--useSpinWait \
--separateProfileRun \
2>&1 | tee -a build.log
Thanks.
Hi @lix19937 , can polygraphy run
specify custom plugin?
It support, by --plugins
Description
I implemented a TensorRT plugin and found the plugin to be particularly time-consuming.
I am compiling the plugin as a separate library and then calling it using the C++ api.
I used cudaStreamSynchronize for synchronization in the begin of enqueue function, and measured it to take about 165ms.
How can I solve this issue? please offer me some advice.
Environment
TensorRT Version: 9.3
NVIDIA GPU: GeForce RTX 3090
NVIDIA Driver Version: 535.183.01
CUDA Version: 12.2
CUDNN Version: 8.9.6
Operating System: ubuntu 22.04