Closed quasiben closed 4 months ago
By installing pytorch-cuda=12.1, there's a CTK confusion when building Numbast:
/opt/conda/envs/test/bin/x86_64-conda-linux-gnu-c++ -DVERSION_INFO=0.0.1 -Dcurand_host_EXPORTS -isystem /opt/conda/envs/test/include/python3.10 -isystem /opt/conda/envs/test/lib/python3.10/site-packages/pybind11/include -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/conda/envs/test/include -O3 -DNDEBUG -fPIC -MD -MT CMakeFiles/curand_host.dir/curand_host_api.cpp.o -MF CMakeFiles/curand_host.dir/curand_host_api.cpp.o.d -o CMakeFiles/curand_host.dir/curand_host_api.cpp.o -c /__w/numbast/numbast/numba_extensions/curand_host/curand_host_api.cpp
/__w/numbast/numbast/numba_extensions/curand_host/curand_host_api.cpp:9:10: fatal error: curand.h: No such file or directory
9 | #include <curand.h>
| ^~~~~~~~~~
compilation terminated.
ninja: build stopped: subcommand failed.
*** CMake build failed
[end of output]
A bigger problem is that we might need to support CTK 12.1 in general if we want to deliver this feature today.
The conda package probably is going to force CUDA 12.1. If we install from pip we might be ok mixing everything into a CUDA 12.4 env.
In V100 test the bfloat16 tests are skipped. While A100 tests are not passing due to:
> raise AssertionError("Torch not compiled with CUDA enabled")
E AssertionError: Torch not compiled with CUDA enabled
Do you think we should setup a new instance with CTK 12.1 and only test bfloat16 + pytorch integration?
@isVoid this is now passing and ready for review
Potential solution for #34