Closed haifengl closed 5 months ago
That is a Python-only function. From the presets, you can check for the device compute capability, or just try to create a small BF16 tensor.
Thanks! How to check device compute capability? torch_cuda.getDeviceProperties()
returns a plain Pointer
.
Also how to get CUDA runtime version such as cudaRuntimeGetVersion()
? torch.C10_CUDA_VERSION_MAJOR
seems compile time version.
Thanks! How to check device compute capability?
torch_cuda.getDeviceProperties()
returns a plainPointer
.
Right. That's something I'm currently working on. Next version of Pytorch presets will depend on CUDA presets and this kind of function will return the proper type. In the meantime, you could directly use the CUDA presets.
Also how to get CUDA runtime version such as
cudaRuntimeGetVersion()
?torch.C10_CUDA_VERSION_MAJOR
seems compile time version.
I'm not sure. Maybe there is a way using the CUDA presets.
I guess you'd better try to create a BF16 gpu tensor and catch the exception if the final objective is the one of your top post.
Thanks. BTW, torch.C10_CUDA_VERSION_MAJOR
and torch.C10_CUDA_VERSION
are always 0, which are not correct.
It is not right just to create a BF16 tensor. On pre-ampere hardware bf16 works, but doesn't provide speed-ups compared to fp32 matmul operations, and some matmul operations are failing outright. So I would like to check cuda version and device compute capability.
I think those are just wrappers for CUDA functions anyway, so I'd try to just use these directly:
Thanks!
Although these methods work fine on a single GPU box, they hang on a multi-GPU box.
I cannot find it anywhere.