Open jperez999 opened 8 months ago
Although this is tripping that block, I would suggest always using PyNVML to query GPU information, specifically what I mention in https://github.com/NVIDIA-Merlin/core/issues/363#issuecomment-1888595036 can be dangerous with Dask if for some reason the cuda = None
is removed in the future.
This is not ready, the failures during writing have to do with when you are writing a file with a client available. Will continue investigating.
Investigated seems that the logic for int_slice_size was not full proof. Because of the floor divide you can find yourself in a scenario where you have less records in the df than the int_slice_size and that can result in a zero. Then when you go to mod on zero the thread raises an exception. I do wonder how we hit this now and not before.
/ok to test
I do wonder how we hit this now and not before.
I agree that this is strange - I wonder if I was wrong about pynvml_mem_size
be "the same".
/ok to test
/ok to test
This PR changes how we determine if cuda is available on the system. We move from numba to using HAS_GPU which uses nvml device count. If there are no devices, then cuda is not available. Otherwise cuda is available.