Closed andrejlevkovitch closed 8 months ago
@andrejlevkovitch by default, the call to tensorNet::ProcessNetwork()
automatically synchronizes the CUDA stream. Hence detectNet::postProcess()
is able to cluster/filter the results on the CPU.
IIRC the commented out call to cudaDeviceSynchronize()
pertained to detectNet::Overlay()
, because that uses CUDA kernels on the overlay image. So if you need to access the overlay image, then a sync would be needed.
Oh, I see now, thanks!
I want to clarify: do we need device synchronization (cudaDeviceSynchronize) before detectnet postprocessing, or not?
https://github.com/dusty-nv/jetson-inference/blob/fe8b42c8da75c1c353dc59fa1fd079820024b89d/c/detectNet.cpp#L518-L537
I see that some time ago the call was commented, but the call is after detectnet postprocessing, what doesn't make a lot of sense, because, if I understand logic correctly, at postprocessing time we already should have access to results on CPU. If yes, then should we synchronize before postprocessing? If not, then why? I don't see any other synchronization calls, not in Detectnet class, not in detectnet example, not in basic tensor class. Is it a bug?