dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.75k stars 2.97k forks source link

Detectnet postprocessing and device synchronization #1782

Closed andrejlevkovitch closed 8 months ago

andrejlevkovitch commented 8 months ago

I want to clarify: do we need device synchronization (cudaDeviceSynchronize) before detectnet postprocessing, or not?

https://github.com/dusty-nv/jetson-inference/blob/fe8b42c8da75c1c353dc59fa1fd079820024b89d/c/detectNet.cpp#L518-L537

I see that some time ago the call was commented, but the call is after detectnet postprocessing, what doesn't make a lot of sense, because, if I understand logic correctly, at postprocessing time we already should have access to results on CPU. If yes, then should we synchronize before postprocessing? If not, then why? I don't see any other synchronization calls, not in Detectnet class, not in detectnet example, not in basic tensor class. Is it a bug?

dusty-nv commented 8 months ago

@andrejlevkovitch by default, the call to tensorNet::ProcessNetwork() automatically synchronizes the CUDA stream. Hence detectNet::postProcess() is able to cluster/filter the results on the CPU.

IIRC the commented out call to cudaDeviceSynchronize() pertained to detectNet::Overlay(), because that uses CUDA kernels on the overlay image. So if you need to access the overlay image, then a sync would be needed.

andrejlevkovitch commented 8 months ago

Oh, I see now, thanks!