Closed thedodd closed 2 years ago
Beyond the codegen making invalid PTX (pls post the ptx if you have it so i can check), this is not an issue i can help much with. Especially considering i've never used jetson GPUs or CUDA on aarch. You are probably better off making a test case in CUDA C++ or another tool and posting it to the nvidia forums
Updated the above with the PTX. Yea, I was going to try to just compile the code directly on the device before building a C++ test case, but the device only has Cuda 10.2 ... so I don't think that will actually work (according to the Getting Started guide anyway).
Thanks boss.
The PTX looks correct, so this is a bit out the project's scope, i would suggest making a C++ test case and opening a forum post on the nvidia website, the people there are very helpful :)
Platform: Jetson Nano 2Gi Arch: aarch64/arm64 OS: Linux Ubuntu 18.04 / Tegra
EDIT (added the PTX):
An important note is that this is all compiled on an Ubuntu 18.04 arm64 container with Cuda 11.4, but the binary is then moved to the L4T-runtime container (which is needed for the Jetson device) which only supports Cuda 10.2. The docs in the Getting Started section of this repo seem to indicate that such a setup should be fine ... though I may have misinterpreted that statement.
Any ideas on what is causing this issue?