SveSop / nvcuda

Standalone version of nvcuda from Wine-Staging
Other
2 stars 0 forks source link

CUDA SDK 11.6 and newer not supported (fully) #1

Closed SveSop closed 10 months ago

SveSop commented 2 years ago

A part of the CUDA SDK is "cuda runtime api". This is a api used by various software like OptiX and others. https://docs.nvidia.com/cuda/cuda-runtime-api/driver-vs-runtime-api.html#driver-vs-runtime-api

This is fairly straightforward when using, but it creates some rather nasty sideeffects for the nvcuda implementation since apps is not compiled against the Linux version of the CUDART.

This is worked around for various parts in the internal.c part of the nvcuda codebase, and has up until SDK 11.5 been rather trivial (for me) to continue supporting. However as of SDK 11.6 (and now 11.7) it is failing misserably. I have been working my best trying to figure out some stuff here, but i have come up short on that one. Seeing as the code uses void's to implement a "dump-the-needed-data-here" kind of function to get around the needs for this it is not overly easy to figure out what/how the missing/wrong data is being relayed.

Sample code compiled with nvcc and visual studio in windows: (simple test.cu)

#include <iostream>
int main() {
  int n = 0;
  cudaError_t error = cudaGetDeviceCount(&n);
  if(error != cudaSuccess) {
      std::cerr << "Error:  " << cudaGetErrorName(error) << std::endl;
      std::cerr << "String: " << cudaGetErrorString(error) << std::endl;
  }
  std::cout << "Number of devices: " << n << std::endl;
}

This will use the cuda runtime api to do various checks and if passed report how many cuda devices is available.

Result compiled with SDK 11.5:

0128:trace:nvcuda:Unknown7_func0_relay (11050, 0x62f2ad9c, 0x11fc28)
Number of devices: 1
0128:trace:nvcuda:Unknown1_func6_relay (0x14006d018, 0xb803f0)
0128:trace:nvcuda:DllMain (0x7f71288f0000, 0, 0x1)

Same code compiled with SDK 11.6:

0128:trace:nvcuda:Unknown7_func0_relay (11060, 0x62f2b1cb, 0x240370)
0128:trace:nvcuda:Unknown7_func0_relay (11061, 0x62f2b1cb, 0x240380)
0128:trace:nvcuda:Unknown7_func0_relay (11062, 0x62f2b1cb, 0x240390)
Error:  cudaErrorSoftwareValidityNotEstablished
String: integrity checks failed
Number of devices: 0
0128:trace:nvcuda:Unknown1_func6_relay (0x14006e018, 0xb80400)
0128:trace:nvcuda:DllMain (0x7f4da2c80000, 0, 0x1)

nvcuda source lines of interest:

https://github.com/SveSop/nvcuda/blob/devel/dlls/nvcuda/internal.c#L218-L233

and

https://github.com/SveSop/nvcuda/blob/devel/dlls/nvcuda/internal.c#L523-L527

static void* WINAPI Unknown7_func0_relay(int cudaVersion, void *param1, void *param2)

The first one cudaVersion i am fairly sure is just as i figured out - a int providing cuda sdk version (11050 = 11.5). Changing this has no effect. The second parameter: void *param1 is i think some data providing info TO the cuda driver. Changing or otherwise fiddling with this ends up with various wine page faults. The third parameter: void *param2 does seem like it is the result back from the driver/call. This i have dabbled with trying to create various structs/char arrays/int arrays+++ and i do get data that can be manipulated AND actually cause the 11.5 compiled test to fail with the same cudaErrorSoftwareValidityNotEstablished as the 11.6 one. However when this fails doing that, it does NOT cause 3 calls with increasing "cudaVersion" like when it fails with 11.6 version.

So - i am basically at a complete loss here.

Why would i want 11.6 and spesifically 11.7 to work with nvcuda? Well, it seems the default OptiX SDK 7.5 is meant to be compiled against cuda sdk 11.7 - although samples i have tested so far that has been compiled against cuda 11.4 has worked, but things like IRAY or other software utilizing OptiX will probably be compiled "as-it-is-supposed-to" and thus probably will require cuda sdk 11.7 compliant driver and nvcuda.

Attaching pre-compiled version of the cuda test sample above. cuda_samples.zip