[bug] Failed to copy to symbol host-to-device: invalid device symbol

anand97 commented 2 years ago

I am able to compile and run the CCTag detection application with no issues on CPU on a Jetson Nano, with the sample images in the repository. On the other hand, when I pass the --use-cuda flag, I get this error:

`You called ./build/Linux-aarch64/detection with: --input sample/02.png --nbrings 3 --bank
--params
--output
--parallel 1 --use-cuda

*** Image mode ** Creating TagPipe 0 Initializing TagPipe 0 /home/dozer/git/CCTag/src/./cctag/cuda/frame_02_gaussian.cu:144 Failed to copy to symbol host-to-device: invalid device symbol src ptr=7faa0113a0 dst ptr=7faa4b73e8 `

CCtag is being built with cuda support and cmake was able to find the appropriate cuda libraries on my Nano. I am running CUDA 10.2.300 . Please let me know if you need any additional information to debug this issue!

simogasp commented 2 years ago

I think the problem might come from the missing architecture flags for Jetson Nano. It requires arch=compute_53,code=sm_53 (see https://forums.developer.nvidia.com/t/jetson-nano-running-openpose-example-gives-a-cuda-check-failed/77196/3) but they are not in our list https://github.com/alicevision/CCTag/blob/develop/CMakeLists.txt#L166

You can try to add them in the CMake but I'm not sure the code is compatible. (just try to add 5.3 to the list)

@griwodz can tell better. We should also update the list of compatible architectures wrt the CUDA version.

anand97 commented 2 years ago

Thanks for your reply. I tried adding the 5.3 Compute capability specifier to the list, and clean rebuilt the repository. I get a different error now though that seems to be related to running out of memory, I'm not sure whether this was the kind of error that we would expect if the code did not support this architecture : ` You called ./Linux-aarch64/detection with: --input ../sample/01.png --nbrings 3 --bank
--params
--output
--parallel 1 --use-cuda

*** Image mode ** Creating TagPipe 0 Initializing TagPipe 0 Loading image 0 into TagPipe 0 terminate called after throwing an instance of 'thrust::system::system_error' what(): copy_if failed on 2nd step: cudaErrorLaunchOutOfResources: too many resources requested for launch Aborted (core dumped) ` I was running tegrastats on the side to look at RAM utilisation, I had about 2.5GB of available system memory( which is shared by the GPU on a Nano) throughout the execution. Please let me know if you have any other ideas that I can try out, thanks!

simogasp commented 2 years ago

Memory could very well be the issue. You can try to crop that image around one of the cctag and process that one. A smaller image require less image. I suggest crop because if you scale down the image too much the cctag won't be detected because too small

anand97 commented 2 years ago

Hello! Thanks again for your suggestion and reply. I tried cropping the sample image down to 260x260 px with one tag visible fully in the center, I still receive a similar error. `You called ./Linux-aarch64/detection with: --input ../samples/02_crop.png --nbrings 3 --bank
--params
--output
--parallel 1 --use-cuda

*** Image mode ** Creating TagPipe 0 Initializing TagPipe 0 Loading image 0 into TagPipe 0 terminate called after throwing an instance of 'terminate called recursively terminate called recursively Aborted (core dumped) `

For an image size of 640 x 640, I get the same cudaErrorLaunchOutOfResources: too many resources requested for launch error. I'm not familiar with cuda programming but am pretty well versed in C++, if you have any suggestions for code updates. Happy to try anything else as well.

anand97 commented 2 years ago

Some more information, when trying with the --sync option, I get this message sometimes: ~/git/CCTag/src/cctag/cuda/debug_macros.cpp:27 called from ~/git/CCTag/src/./cctag/cuda/frame_07c_eval.cu:243 cudaGetLastError failed: invalid configuration argument terminate called recursively

simogasp commented 2 years ago

I'm not an expert in cuda either. The cuda code was optimized to run in real time on GTXs. It is possible some of the configurations used for blocks, num threads etc are not supported on the Nano. Looking at this e.g. https://stackoverflow.com/questions/16125389/invalid-configuration-argument-error-for-the-call-of-cuda-kernel it seems that the error you are getting with the sync option might refer to that (unsupported configuration). we have to wait for @griwodz to have confirmation

anand97 commented 2 years ago

Hello @simogasp thanks for your reply. I'm happy to report that I seem to have solved the problem. Just like you mentioned, the block size seemed to be the issue. From another issue https://github.com/alicevision/CCTag/issues/170#issuecomment-903901699, and https://www.wikiwand.com/en/CUDA#/Version_features_and_specifications it seems like the Jetson nano supports only 16 'grids per resident device', hence I changed the block size in all the spots in frame_07c_eval.cu. Compiled and ran perfectly at about 0.9s per frame, even at full resolution of the sample files. I can attach a diff for posterity if you'd like to incorporate it into the framework. Thanks again for all your help!

Note: I figure this change has to be made on all the other cuda files as well, but it seems like just this fixed it for me.

simogasp commented 2 years ago

thanks for testing that. I don't know if it is doable, but it would be nice if we could parametrize these parameters according to the architecture. It's nice to hear that it can work even on the jetson nano with a decent performance. We should definitely find the way to enable that at compiling time.

github-actions[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

github-actions[bot] commented 2 months ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

alicevision / CCTag

[bug] Failed to copy to symbol host-to-device: invalid device symbol #192