NVIDIA / gvdb-voxels

Sparse volume compute and rendering on NVIDIA GPUs
Other
672 stars 144 forks source link

PTX JIT compilation fails #102

Closed tsvilans closed 3 years ago

tsvilans commented 3 years ago

Hi,

I compiled GVDB from source using Cmake + Visual Studio 2019, x64.

Running the gDepthMap example throws an error:

GVDB CUDA ERROR:
  Launch status: a PTX JIT compilation failed
  Kernel status: no error
  Caller: VolumeGVDB::LoadFunction
  Call:   cuModuleLoad
  Args:   cuda_gvdb_module.ptx
Error. Application will exit.

I've tried this on 2 Windows 10 machines - with a GTX660 Ti and a Quadro K620M - with the same result. I have updated my CUDA installation with the latest version (v11.0).

The pre-compiled examples from this Git repo work fine on the Quadro K620M.

If I'm not overlooking something obvious, more documentation on this aspect would be a good addition!

Thanks!

NBickford-NV commented 3 years ago

Hi tsvilans!

That error means that GVDB wasn't able to find the cuda_gvdb_module.ptx file containing compiled CUDA code on disk. (It should usually be in the same directory as the executable.)

My initial guess is that the gvdbPTX project might not have been built (the CMake system is supposed to make sure these dependencies are handled correctly, but there could be a bug there). Does a cuda_gvdb_module.ptx file exist on disk anywhere?

Another possibility is that CMake might have failed to copy cuda_gvdb_module.ptx - if the file exists, there should be a step in the CMake file that copies cuda_gvdb_module.ptx to the same folder as gDepthMap.exe.

Hope this helps!

tsvilans commented 3 years ago

Hi Neil, thanks for getting back so quick!

There is indeed a cuda_gvdb_module.ptx in the executable folder...

I checked the build folder, and in gvdb_library, there is a gvdbPTX.vcxproj project file and in the same Release folder there is also the (presumably original) cuda_gvdb_module.ptx. I copied this manually to the executable folder, but still get the same error...

I rebuilt from scratch, with default Cmake configuration and it's the same 🤷‍♂️

NBickford-NV commented 3 years ago

Oh, I just realized I misread the original error! I know the issue now: PTX JIT compilation is failing because CUDA 11 removed support for Kepler GPUs (specifically compute capability (CC) 3.0 and 3.2 products; 3.5, 3.7, and 5.0 are deprecated. For reference, Ampere is CC 8.x). Both of the GPUs listed in the original message, the GTX660 Ti and the Quadro K620M, are fairly old - the GTX 660 Ti is CC 3.0, while the K620M is CC 5.0.

Compiled PTX files have a minimum CC that they're compiled for; the default when using CUDA 11 is 5.2, since 5.0 is deprecated (see description for why below). The solution is to compile using a version of CUDA before 11 (e.g. 10.2), to use a GPU with the Maxwell or more recent architecture (or at least compute capability 3.5 - see https://developer.nvidia.com/cuda-gpus), or to modify the CMake file to target architecture compute_50.

Here's what's going on internally, in case it's interesting! GPUs have many different microarchitectures, so compiled PTX is sort of an intermediate representation of the code, and then the first time one loads a PTX file, CUDA compiles and optimizes the PTX just-in-time for their GPU's architecture. This makes it so that one only needs to supply PTX (but if one wants, they can also include assembly for different GPUs in a .cubin file to make an application load faster the first time).

Now, CUDA adds features over time, like support for Tensor Cores, so although old PTX files can be JIT compiled for newer GPUs, PTX files generated for newer compute architectures can include things (like Tensor Core operations) that older compute architectures don't have. (An analogy is like how only newer CPUs support AVX vector instructions.) This means that each block of PTX has a minimum compute capability that it requires. Since CUDA 11 deprecated CC 5.0, its default is to require the next minimum compute capability, 5.2. (If a file contains code with only a single PTX version, one can check the version by e.g. opening the file in a text editor, or using cuobjdump -ptx.) It's possible to tell nvcc to compile for an older architecture version by modifying the CMake script to add flags for the CUDA command line, like this:

set(CMAKE_CUDA_FLAGS "${CMAKE_CUDA_FLAGS} -gencode=arch=compute_35")

(note that this has not been tested). This will compile PTX for compute architecture 3.5, which can be JIT-compiled for the K620M (since CC 5.0 >= CC 3.5). However, in order to generate a PTX file with a low enough compute capability requirement that it can be JIT-compiled for the GTX 660 Ti (which has just passed eight years old), one will need to use an older version of CUDA, such as 10.2.

Hope this helps!

tsvilans commented 3 years ago

Neil, you legend! Thanks so much for the detailed explanation. This looks very clear now.

I tried adding -gencode=arch=compute_35 to the CMAKE_CUDA_FLAGS in Cmake, but to no avail. This lets me compile the projects, and then if I copy cuda_gvdb_module.ptx from the pre-built binaries in the bin folder, then it seems to work.

I will try installing CUDA 10 and see how that goes. I'm long overdue for new graphics cards anyway, so eventually I will have to get something more current :P

A useful option in the Cmake build files would be to set the CUDA version, though I suppose I can just do that by changing the environment variable.

Thanks so much for the help.

jacquesvaneeden commented 3 years ago

Neil I am having a similar issue call to gvdb.SetCudaDevice(GVDB_DEV_FIRST) fail. After debugging I get error 301 file not found and cuda_gvdb_module.ptx seems to be the culprit

In my case all the samples works fine, no problem compiling and running. I started a new Windows MFC application (not console). Just can not get past this issues. cuda_gvdb_module.ptx & cuda_gvdb_copydata.ptx is in the new application directory so should work Any advice please!!

Couple of other questions, if I may.

  1. I am not familiar with the nVidia toolkits, the tools used for the samples nv_gui and main_win.cpp, where can I find documentation about the tools?
  2. Apart from the pdf's with the GVDB download, is the some place where I can find more detailed information or help?
NBickford-NV commented 3 years ago

Hi Jacques!

My first thought is that maybe there's a difference in the working directory that's causing this to fail - in the samples' CMakeLists.txt files (e.g. here), we set the Visual Studio debugger working directory to the executable directory, instead of the default (which I think is $(ProjectDir) if I remember correctly). Since you mention using the Windows MFC application template, is it possible this is set to the default, and maybe changing it to something like $(TargetDir) might fix this? (If so, it might be a good idea for me to configure the library so that it uses the DLL's path!)

Here are some answers for your other questions:

  1. nv_gui and main_win are custom for GVDB by Rama Hoetzlein, so they aren't used in many of the other NVIDIA projects - I usually wind up looking at the source code to determine how they work.
  2. There is - although the PDFs are the best general-purpose overview, you'll find comments for many of the functions above their definitions in gvdb_volume_gvdb.cpp, like this:
    // Takes `inbuf`, a buffer of `cnt` Vector3DF objects (i.e. `3*cnt` floats),
    // and writes the length of each vector into `outbuf`, a buffer of `cnt` floats.
    // Sets vmin and vmax to the minimum and maximum lengths.
    float* ConvertToScalar ( int cnt, float* inbuf, float* outbuf, float& vmin, float& vmax )

    as well as some more recent comments in gvdb_volume_gvdb.h. (I've been working on moving these function descriptions when I can from gvdb_volume_gvdb.cpp to gvdb_volume_gvdb.h.)

At the moment, probably the best way to get started with GVDB is by modifying the samples, such as g3DPrint - that way, one can get a sense of what a full application with GVDB looks like, and have a codebase that works to start with. Please feel free to create new issues as you find issues, by the way, or email me at nbickford@nvidia.com!

jacquesvaneeden commented 3 years ago

Neil Thanks for the feedback. Pointing Working Directory and Output Directory to the same location as well as copying all the shader and .ptx files there solved the issue. $(SolutionDir)$(Configuration)\ in my case

Onto next issue I get a debug assertion error when loading a model, I am using Luch.obj. Everything loads fine but return mModels.size() throws and exception when size in Vector is called size call some delete function in delete_scalar.cpp which throws the exception

I can draw the topology, change voxel size, scale the model etc. but the model doesn't draw only the topology. I am assuming the model is not loaded correctly because of the above issue or am I missing something?.

size_t Scene::AddModel ( std::string filestr, float scale, float tx, float ty, float tz) { Model* m = AddModel (); LoadModel ( m, filestr, scale, tx, ty, tz ); return mModels.size()-1; }

_CRT_SECURITYCRITICAL_ATTRIBUTE void __CRTDECL operator delete(void* const block) noexcept {

ifdef _DEBUG

_free_dbg(block, _UNKNOWN_BLOCK);
#else
free(block);
#endif

}

NBickford-NV commented 3 years ago

Hi Jacques,

If you happen to have sample code I could look at, that would help, since I think this might not be an issue with std::vector::size - MSVC 2019's implementation in include\vector doesn't call a delete function:

    _NODISCARD size_type size() const noexcept {
        auto& _My_data = _Mypair._Myval2;
        return static_cast<size_type>(_My_data._Mylast - _My_data._Myfirst);
    }

So my guess is that the issue is somewhere else (so probably the model isn't being loaded correctly).

Also, let's create a new issue for this if it's still an issue, since this issue is about PTX JIT compilation.

Thanks!