The detector code seems to run pretty slow on the Nvidia board. The solution is to port the code to use the CUDA cores on the board. This might need a rebuild of OpenCV however.
We did some timing tests and we noticed that the total processing time for each detector loop is around 30 milliseconds, which is much lower than what we're seeing, so it's probably not the detector code.
The detector code seems to run pretty slow on the Nvidia board. The solution is to port the code to use the CUDA cores on the board. This might need a rebuild of OpenCV however.