Open mrakgr opened 5 days ago
If you want to try installing CuPy use the pip install cupy-cuda12x
command. Trying to install just cupy
won't work.
Here is a minimal reproducer that does not depend on CuPy:
#include "cutlass/gemm/device/gemm_universal_adapter.h"
extern "C" global void my_func() {}
- commands
```sh
$ nvcc -I"[CUTLASS_PATH]/include" -I"[CUTLASS_PATH]/tools/util/include" --std=c++17 --device-c -o sample.o sample.cu
$ nvcc -I"[CUTLASS_PATH]/include" -I"[CUTLASS_PATH]/tools/util/include" --cubin --device-link sample.o -o sample.cubin
$ cuobjdump -symbols sample.o
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit
compressed
symbols:
STT_CUDA_OBJECT STB_LOCAL STO_GLOBAL __nv_static_38__ef1904b4_9_sample_cu_b232a47d_2970605__ZN47_INTERNAL_ef1904b4_9_sample_cu_b232a47d_29706054cute1_E
STT_CUDA_OBJECT STB_LOCAL STO_GLOBAL __nv_static_38__ef1904b4_9_sample_cu_b232a47d_2970605__ZN47_INTERNAL_ef1904b4_9_sample_cu_b232a47d_29706054cute7productE
STT_CUDA_OBJECT STB_LOCAL STO_? _SREG
STT_FUNC STB_GLOBAL STO_ENTRY my_func
Fatbin ptx code:
================
arch = sm_52
code version = [8,4]
host = linux
compile_size = 64bit
compressed
ptxasOptions = --compile-only
$ cuobjdump -symbols sample.cubin
Fatbin elf code:
================
arch = sm_52
code version = [1,7]
host = linux
compile_size = 64bit
symbols:
STT_OBJECT STB_LOCAL STV_DEFAULT __nv_static_38__ef1904b4_9_sample_cu_b232a47d_2970605__ZN47_INTERNAL_ef1904b4_9_sample_cu_b232a47d_29706054cute1_E
STT_OBJECT STB_LOCAL STV_DEFAULT __nv_static_38__ef1904b4_9_sample_cu_b232a47d_2970605__ZN47_INTERNAL_ef1904b4_9_sample_cu_b232a47d_29706054cute7productE
The file 'sample.o' contains a symbol named my_func
, but 'sample.cubin' does not.
I'll go ahead and also open an issue with Nvidia for this. I am not a C++ expert, but it's hard for me to imagine that the library itself is doing something to drop the extern
s. Most likely, this is a NVCC compiler bug.
Thanks, @mrakgr! One more question, I assume you've tried with NVRTC and you hit the same error, thus switching to NVCC?
No, I haven't tried NVRTC. The trouble with NVRTC is that it cannot compile recursive types properly.
https://developer.nvidia.com/bugs/4704632 https://github.com/mrakgr/Spiral-s-ML-Library/blob/9e030d00d50ca9fe6ddcd9bcb39cce0dab2b9b81/tests/test2.py#L182
This example wouldn't compile with NVRTC, but it does with NVCC and as far as I can tell, it's impossible to define recursive union types in NVRTC, so since then I've been using NVCC. The Nvidia rep said they'd fix it.
Describe the bug
In the following example, including the
#include "cutlass/gemm/device/gemm_universal_adapter.h"
line is causing CuPy to be unable to find theqwert_entry0
function. Could including that header be affecting the function names in the compiled program?Steps/Code to reproduce bug
Expected behavior Thread 0 should print
hello
.Environment details (please complete the following information):
Additional context
Here is what happens when I run the script.
In order to actually run the script, you'll have to install CuPy.