Open 0xdevalias opened 1 year ago
⇒ find / -name cuda.h
/opt/conda/envs/dreambooth/lib/python3.10/site-packages/nvidia/cuda_runtime/include/cuda.h
/opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
/opt/conda/lib/python3.7/site-packages/nvidia/cuda_runtime/include/cuda.h
/opt/conda/pkgs/pytorch-1.12.0-py3.7_cuda11.3_cudnn8.3.2_0/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
find: '/proc/tty/driver': Permission denied
/usr/include/linux/cuda.h
One step closer it seems!
⇒ conda install -c nvidia cuda-libraries-dev
⇒ find / -name cuda.h
/opt/conda/envs/dreambooth/lib/python3.10/site-packages/nvidia/cuda_runtime/include/cuda.h
/opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
+ /opt/conda/envs/dreambooth/include/cuda.h
/opt/conda/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
/opt/conda/lib/python3.7/site-packages/nvidia/cuda_runtime/include/cuda.h
/opt/conda/pkgs/pytorch-1.12.0-py3.7_cuda11.3_cudnn8.3.2_0/lib/python3.7/site-packages/torch/include/torch/csrc/api/include/torch/cuda.h
/opt/conda/pkgs/cuda-cudart-dev-11.8.89-0/include/cuda.h
find: '/proc/tty/driver': Permission denied
/usr/include/linux/cuda.h
Yet still getting the error: fatal error: cuda.h: No such file or directory
I noticed that the call to gcc
doesn't seem to pass this include path in it's -I
's.. haven't dug deeper into the relevant code to figure out why/how to potentially change that:
subprocess.CalledProcessError: Command '['/usr/bin/gcc', '/tmp/tmpzfde7mdr/main.c', '-O3', '-I/usr/local/cuda/include', '-I/opt/conda/envs/dreambooth/include/python3.10', '-I/tmp/tmpzfde7mdr', '-shared', '-fPIC', '-lcuda', '-o', '/tmp/tmpzfde7mdr/layer_norm_fw.cpython-310-x86_64-linux-gnu.so', '-L/usr/lib/x86_64-linux-gnu']' returned non-zero exit status 1.
generate_launcher
seems to be generating the code that has the #include "cuda.h"
line that's erroring, then calls the _build
function.cuda_lib_dirs = libcuda_dirs()
cu_include_dir = os.path.join(cuda_home_dirs(), "include")
py_include_dir = get_paths()["include"]
cc_cmd = [cc, src, "-O3", f"-I{cu_include_dir}", f"-I{py_include_dir}", f"-I{srcdir}", "-shared", "-fPIC", "-lcuda", "-o", so]
def libcuda_dirs()
locs = subprocess.check_output(["whereis", "libcuda.so"])
⇒ whereis libcuda.so
libcuda: /usr/lib/x86_64-linux-gnu/libcuda.so
⇒ find / -name libcuda.so
/opt/conda/envs/dreambooth/lib/stubs/libcuda.so
/opt/conda/pkgs/cuda-driver-dev-11.8.89-0/lib/stubs/libcuda.so
find: '/proc/tty/driver': Permission denied
/usr/lib/x86_64-linux-gnu/libcuda.so
def cuda_home_dirs()
default_dir = "/usr/local/cuda"
return os.getenv("CUDA_HOME", default=default_dir)
⇒ ls -la /usr/local/cuda
lrwxrwxrwx 1 root root 17 Nov 10 06:36 /usr/local/cuda -> /tmp/tmpgyc5dwz3/
⇒ echo $CUDA_HOME
⇒ python -c "from sysconfig import get_paths; print(get_paths()['include'])"
/opt/conda/envs/dreambooth/include/python3.10
Setting CUDA_HOME
seemed to allow it to progress a little bit more, and run into a new/different error:
!CUDA_HOME=/opt/conda/envs/dreambooth conda run -n dreambooth --live-stream python3 xformers/benchmarks/benchmark_encoder.py --activations relu --plot -emb 256 -bs 32 -heads 16
Traceback (most recent call last):
File "<string>", line 21, in layer_norm_fw
KeyError: ('2-.-0-.-0--7929002797455b30efce6e41eddc6b57-3aa563e00c5c695dd945e23b09a86848-d962222789c30252d492a16cca3bf467-ff946bd4b3b4a4cbdf8cedc6e1c658e0-5c5e32ff210f3b7f56c98ca29917c25e-06f0df2d61979d629033f4a22eff5198-0dd03b0bd512a184b3512b278d9dfa59-d35ab04ae841e2714a253c523530b071', (torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, torch.float32, 'i32', 'i32', 'fp32'), (True, 256), (True, True, True, True, True, True, (True, False), (True, False), (False,)))
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/triton/layer_norm.py", line 223, in layer_norm
return _LayerNorm.apply(x, weight, bias, eps)
File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/torch/cuda/amp/autocast_mode.py", line 97, in decorate_fwd
return fwd(*args, **kwargs)
File "/workspace/thelastben-diffusers/examples/dreambooth/xformers/xformers/triton/layer_norm.py", line 73, in forward
layer_norm_fw[(M,)](
File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/triton/runtime/jit.py", line 106, in launcher
return self.run(*args, grid=grid, **kwargs)
File "<string>", line 41, in layer_norm_fw
File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/triton/compiler.py", line 1256, in compile
asm, shared, kernel_name = _compile(fn, signature, device, constants, configs[0], num_warps, num_stages,
File "/opt/conda/envs/dreambooth/lib/python3.10/site-packages/triton/compiler.py", line 901, in _compile
name, asm, shared_mem = _triton.code_gen.compile_ttir(backend, module, device, num_warps, num_stages, extern_libs, cc)
RuntimeError: `ptxas` was searched in TRITON_PTXAS_PATH, /usr/local/cuda/bin/ or PATH but a working version could not be found.
Which may be related to this:
I wonder if the stuff I figured in the following will help here? (to explore when I get a chance):
the "triton has no code_gen attritbute" is unrelated, tied to a recent triton update, sorry about that. Fixed in #528
🐛 Bug
Trying to follow along with:
Command
To Reproduce
Steps to reproduce the behavior:
!conda run -n dreambooth --live-stream python3 xformers/benchmarks/benchmark_encoder.py --activations relu --plot -emb 256 -bs 32 -heads 16
Expected behavior
The benchmark would run successfully.
Environment
Additional context