When I start 3x 3090 cloud instances, I keep getting this error:
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda
functions before calling NumCudaDevices() that might have already set an error?
Error 804: forward compatibility was attempted on non supported HW
This seems to happen almost all of the time now (at least on "cloud" instances), so getting a working GPU instance is a nightmare. All I've found from searching is "I rebooted and the problem went away", which isn't much help.
Just thought I'd see if anyone knew of a workaround. It sucks since 3x3090 on their cloud instances seems to be the most economical way to get Goliath running on RunPod (about $.75/hr), I keep having to switch to an A100 instance which costs twice as much...
When I start 3x 3090 cloud instances, I keep getting this error:
RuntimeError: Unexpected error from cudaGetDeviceCount(). Did you run some cuda functions before calling NumCudaDevices() that might have already set an error? Error 804: forward compatibility was attempted on non supported HW
This seems to happen almost all of the time now (at least on "cloud" instances), so getting a working GPU instance is a nightmare. All I've found from searching is "I rebooted and the problem went away", which isn't much help.
Just thought I'd see if anyone knew of a workaround. It sucks since 3x3090 on their cloud instances seems to be the most economical way to get Goliath running on RunPod (about $.75/hr), I keep having to switch to an A100 instance which costs twice as much...