Closed nmynol closed 2 years ago
sorry to bother you again, I get this error when running MultiKMeans:
Traceback (most recent call last):
File "/data/oyaooya/anaconda3/envs/nmy/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 654, in compile
nvrtc.compileProgram(self.ptr, options)
File "cupy_backends/cuda/libs/nvrtc.pyx", line 133, in cupy_backends.cuda.libs.nvrtc.compileProgram
File "cupy_backends/cuda/libs/nvrtc.pyx", line 145, in cupy_backends.cuda.libs.nvrtc.compileProgram
File "cupy_backends/cuda/libs/nvrtc.pyx", line 64, in cupy_backends.cuda.libs.nvrtc.check_status
cupy_backends.cuda.libs.nvrtc.NVRTCError: NVRTC_ERROR_COMPILATION (6)
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "model/kmeansenc.py", line 121, in <module>
coltran = AutoEncoder().cuda()
File "model/kmeansenc.py", line 70, in __init__
self.kmeans = MultiKMeans(n_clusters=256, distance="euclidean")
File "/data/oyaooya/anaconda3/envs/nmy/lib/python3.8/site-packages/torchpq/clustering/MultiKMeans.py", line 115, in __init__
self.warmup_kernels()
File "/data/oyaooya/anaconda3/envs/nmy/lib/python3.8/site-packages/torchpq/clustering/MultiKMeans.py", line 229, in warmup_kernels
self.topk_sim_cuda(a, b, dim=1, k=128)
File "/data/oyaooya/anaconda3/envs/nmy/lib/python3.8/site-packages/torchpq/kernels/TopkBMMCuda.py", line 144, in __call__
kernel_fn(
File "cupy/_core/raw.pyx", line 89, in cupy._core.raw.RawKernel.__call__
File "cupy/_core/raw.pyx", line 96, in cupy._core.raw.RawKernel.kernel.__get__
File "cupy/_core/raw.pyx", line 113, in cupy._core.raw.RawKernel._kernel
File "cupy/_util.pyx", line 59, in cupy._util.memoize.decorator.ret
File "cupy/_core/raw.pyx", line 547, in cupy._core.raw._get_raw_module
File "cupy/_core/core.pyx", line 1974, in cupy._core.core.compile_with_cache
File "cupy/_core/core.pyx", line 2040, in cupy._core.core.compile_with_cache
File "/data/oyaooya/anaconda3/envs/nmy/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 461, in compile_with_cache
return _compile_with_cache_cuda(
File "/data/oyaooya/anaconda3/envs/nmy/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 539, in _compile_with_cache_cuda
ptx, mapping = compile_using_nvrtc(
File "/data/oyaooya/anaconda3/envs/nmy/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 299, in compile_using_nvrtc
return _compile(source, options, cu_path,
File "/data/oyaooya/anaconda3/envs/nmy/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 283, in _compile
compiled_obj, mapping = prog.compile(options, log_stream)
File "/data/oyaooya/anaconda3/envs/nmy/lib/python3.8/site-packages/cupy/cuda/compiler.py", line 671, in compile
raise CompileException(log, self.src, self.name, options,
cupy.cuda.compiler.CompileException: /tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(245): error: identifier "__stcs" is undefined
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(359): warning: variable "wx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(360): warning: variable "wy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(409): warning: variable "dx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(410): warning: variable "dy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(505): warning: variable "wx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(506): warning: variable "wy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(532): warning: variable "dx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(533): warning: variable "dy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(599): warning: variable "wx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(600): warning: variable "wy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(624): warning: variable "dx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(625): warning: variable "dy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(963): warning: variable "bDimY" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(975): warning: variable "wx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(976): warning: variable "wy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(977): warning: variable "dx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(978): warning: variable "dy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1060): warning: variable "bDimY" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1072): warning: variable "wx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1073): warning: variable "wy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1074): warning: variable "dx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1075): warning: variable "dy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1157): warning: variable "bDimY" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1169): warning: variable "wx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1170): warning: variable "wy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1171): warning: variable "dx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1172): warning: variable "dy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1254): warning: variable "bDimY" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1266): warning: variable "wx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1267): warning: variable "wy" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1268): warning: variable "dx" was declared but never referenced
/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu(1269): warning: variable "dy" was declared but never referenced
1 error detected in the compilation of "/tmp/tmpw4jx4uh7/6b7efdd9db889434f1a0d290c0aa7f11_2.cubin.cu".
Do you have any idea on how to solve it? Thanks a lot!
RE first issue:
Yes torchpq.clustering
is the right module, I forgot to change README, thanks for pointing it out!
RE second issue: Can you run the following and post the results?
wget https://raw.githubusercontent.com/pytorch/pytorch/master/torch/utils/collect_env.py
python collect_env.py
It seems to be that either your CUDA driver version is lower than 11.0, or the compute capability of your GPU is 3.5 or lower (Fermi, some of Kepler). If the latter is the case, I can modify the code to support older architectures, otherwise, you would need to uninstall CuPy, upgrade your CUDA Toolkit to 11.0 or higher, then reinstall CuPy with a matching CUDA driver version (e.g. pip install cupy-cuda-110 --no-cache-dir
). I should've put the requirement for CUDA driver version in README.
Thank you very much for your response! I run the script and get this result:
Collecting environment information...
PyTorch version: 1.6.0
Is debug build: False
CUDA used to build PyTorch: 10.1
ROCM used to build PyTorch: N/A
OS: Ubuntu 16.04 LTS (x86_64)
GCC version: (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609
Clang version: Could not collect
CMake version: version 3.16.0-rc2
Libc version: glibc-2.23
Python version: 3.8.5 (default, Sep 4 2020, 07:30:14) [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.4.0-177-generic-x86_64-with-glibc2.10
Is CUDA available: True
CUDA runtime version: 7.5.17
GPU models and configuration:
GPU 0: GeForce GTX 1080 Ti
GPU 1: GeForce GTX 1080 Ti
GPU 2: GeForce GTX 1080 Ti
GPU 3: GeForce GTX 1080 Ti
GPU 4: GeForce GTX 1080 Ti
GPU 5: GeForce GTX 1080 Ti
GPU 6: GeForce GTX 1080 Ti
Nvidia driver version: 430.64
cuDNN version: /usr/local/cuda-9.0/lib64/libcudnn.so.7.0.4
HIP runtime version: N/A
MIOpen runtime version: N/A
Versions of relevant libraries:
[pip3] fast-pytorch-kmeans==0.1.6
[pip3] numpy==1.19.1
[pip3] numpy-quaternion==2020.9.5.14.42.2
[pip3] pytorch-fid==0.2.0
[pip3] stylegan2-pytorch==1.8.6
[pip3] torch==1.6.0
[pip3] torch-cluster==1.5.9
[pip3] torch-geometric==1.7.0
[pip3] torch-scatter==2.0.6
[pip3] torch-sparse==0.6.9
[pip3] torch-spline-conv==1.2.1
[pip3] torch-tb-profiler==0.3.1
[pip3] torchfile==0.1.0
[pip3] torchpq==0.3.0.2
[pip3] torchvision==0.7.0
[pip3] vector-quantize-pytorch==0.1.0
[conda] blas 1.0 mkl defaults
[conda] cudatoolkit 10.1.243 h6bb024c_0 defaults
[conda] fast-pytorch-kmeans 0.1.6 pypi_0 pypi
[conda] mkl 2020.2 256 defaults
[conda] mkl-service 2.3.0 py38he904b0f_0 defaults
[conda] mkl_fft 1.2.0 py38h23d657b_0 defaults
[conda] mkl_random 1.1.1 py38h0573a6f_0 defaults
[conda] numpy 1.19.0 pypi_0 pypi
[conda] numpy-base 1.19.1 py38hfa32c7d_0 defaults
[conda] pytorch 1.6.0 py3.8_cuda10.1.243_cudnn7.6.3_0 pytorch
[conda] pytorch-fid 0.2.0 pypi_0 pypi
[conda] stylegan2-pytorch 1.8.6 pypi_0 pypi
[conda] torch-cluster 1.5.9 pypi_0 pypi
[conda] torch-geometric 1.7.0 pypi_0 pypi
[conda] torch-scatter 2.0.6 pypi_0 pypi
[conda] torch-sparse 0.6.9 pypi_0 pypi
[conda] torch-spline-conv 1.2.1 pypi_0 pypi
[conda] torch-tb-profiler 0.3.1 pypi_0 pypi
[conda] torchfile 0.1.0 pypi_0 pypi
[conda] torchpq 0.3.0.2 pypi_0 pypi
[conda] torchvision 0.7.0 py38_cu101 pytorch
[conda] vector-quantize-pytorch 0.1.0 pypi_0 pypi
The 1080ti have compute capability 6.1, so the first possibility can be ruled out. So the solution to your issue is to upgrade your CUDA Toolkit to 11.0 or higher, and reinstall CuPy, as I explained in the previous comment.
One thing to notice is that currently there is no multi-gpu support in TorchPQ, everything will run on cuda:0
by default.
Get it, thank you so much for your helpful advice!
Thanks for the nice work! But when I tried to import MultiKMeans using the command shown in README.md:
from torchpq.kmeans import MultiKMeans
it goes wrong and said:ModuleNotFoundError: No module named 'torchpq.kmeans'
And when I try to use:from torchpq.clustering import MultiKMeans
to import, and it goes right. I wonder if it is correct since it is different from what README.md says.