NVIDIA / numbast

Numbast is a tool to build an automated pipeline that converts CUDA APIs into Numba bindings.
Apache License 2.0
15 stars 6 forks source link

Pytorch Test and Proxy #41

Closed quasiben closed 4 months ago

quasiben commented 5 months ago

Potential solution for #34

isVoid commented 5 months ago

By installing pytorch-cuda=12.1, there's a CTK confusion when building Numbast:

      /opt/conda/envs/test/bin/x86_64-conda-linux-gnu-c++ -DVERSION_INFO=0.0.1 -Dcurand_host_EXPORTS -isystem /opt/conda/envs/test/include/python3.10 -isystem /opt/conda/envs/test/lib/python3.10/site-packages/pybind11/include -fvisibility-inlines-hidden -fmessage-length=0 -march=nocona -mtune=haswell -ftree-vectorize -fPIC -fstack-protector-strong -fno-plt -O2 -ffunction-sections -pipe -isystem /opt/conda/envs/test/include -O3 -DNDEBUG -fPIC -MD -MT CMakeFiles/curand_host.dir/curand_host_api.cpp.o -MF CMakeFiles/curand_host.dir/curand_host_api.cpp.o.d -o CMakeFiles/curand_host.dir/curand_host_api.cpp.o -c /__w/numbast/numbast/numba_extensions/curand_host/curand_host_api.cpp
      /__w/numbast/numbast/numba_extensions/curand_host/curand_host_api.cpp:9:10: fatal error: curand.h: No such file or directory
          9 | #include <curand.h>
            |          ^~~~~~~~~~
      compilation terminated.
      ninja: build stopped: subcommand failed.

      *** CMake build failed
      [end of output]

A bigger problem is that we might need to support CTK 12.1 in general if we want to deliver this feature today.

isVoid commented 5 months ago

a94e415 and 37a6f78 are two commits I pushed to leverage the CI to run tests for bfloat16 and the pytorch integration in ctk12.1. Skipping fp16 and curand tests.

quasiben commented 5 months ago

The conda package probably is going to force CUDA 12.1. If we install from pip we might be ok mixing everything into a CUDA 12.4 env.

isVoid commented 5 months ago

In V100 test the bfloat16 tests are skipped. While A100 tests are not passing due to:

 >               raise AssertionError("Torch not compiled with CUDA enabled")
E               AssertionError: Torch not compiled with CUDA enabled

Do you think we should setup a new instance with CTK 12.1 and only test bfloat16 + pytorch integration?

quasiben commented 5 months ago

@isVoid this is now passing and ready for review