Jittor / jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
https://cg.cs.tsinghua.edu.cn/jittor/
Apache License 2.0
3.07k stars 307 forks source link

jittor1.3.9的3d算子报错 #560

Open yykmeng opened 2 months ago

yykmeng commented 2 months ago

Describe the bug

python -m jittor.test.test_cudnn_op时报错

/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3dTx_float32__Ty_float32Tw_float32JIT_1JIT_cuda_1index_t_int32_hash_f7dc3a0a93f44f4e_op.cc(38): error: function "jittor::getDataType() [with T_ELEM=float]" has already been defined template <> inline__ cudnnDataType_t getDataType() { return CUDNN_DATA_FLOAT; } ^

2 errors detected in the compilation of "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3dTx_float32__Ty_float32Tw_float32JIT_1JIT_cuda_1index_t_int32_hash_f7dc3a0a93f44f4e_op.cc". /home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3d__Tx_float32Ty_float32Tw_float32__JIT_1JIT_cuda_1__index_t_int32_hash_f7dc3a0a93f44f4e_op.cc(37): error: function "jittor::getDataType() [with T_ELEM=half1]" has already been defined template <> inline cudnnDataType_t getDataType() { return CUDNN_DATA_HALF; } ^

/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3dTx_float32__Ty_float32Tw_float32JIT_1JIT_cuda_1index_t_int32_hash_f7dc3a0a93f44f4e_op.cc(38): error: function "jittor::getDataType() [with T_ELEM=float]" has already been defined template <> inline__ cudnnDataType_t getDataType() { return CUDNN_DATA_FLOAT; } ^

2 errors detected in the compilation of "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3dTx_float32__Ty_float32Tw_float32__JIT_1JIT_cuda_1index_t_int32_hash_f7dc3a0a93f44f4e_op.cc". EE

ERROR: test_conv3d (main.TestCudnnConvOp)

Traceback (most recent call last): File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 142, in check jt.sync_all() RuntimeError: [f 0616 09:51:29.555904 84 executor.cc:686] Execute fused operator(2/7) failed.

[Input]: float32[2,4,10,10,10,], float32[5,4,3,3,3,],

[Async Backtrace]: not found, please set env JT_SYNC=1, trace_py_var=3 [Reason]: [f 0616 09:51:29.555525 84 log.cc:605] Check failed: ret>=0 && ret<=256 Run cmd failed: "/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/bin/nvcc" "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3dTx_float32__Ty_float32Tw_float32JIT_1JIT_cuda_1index_t_int32_hash_f7dc3a0a93f44f4e_op.cc" -std=c++14 -Xcompiler -fPIC -Xcompiler -march=native -Xcompiler -fdiagnostics-color=always -lstdc++ -ldl -shared -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/src" -I/home2/ykm2023/miniconda3/envs/dl/include/python3.9 -I/home2/ykm2023/miniconda3/envs/dl/include/python3.9 -DHAS_CUDA -DIS_CUDA -I"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/include" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc" -lcudart -L"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -I"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default" -l:"jit_utils_core.cpython-39-x86_64-linux-gnu".so -l:"jittor_core.cpython-39-x86_64-linux-gnu".so -x cu --cudart=shared -ccbin="/usr/bin/g++" --use_fast_math -w -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc" -arch=compute_86 -code=sm_86 -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/inc" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/ops" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/inc" -lcudnn -L"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/cuda" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/cuda" -l:libcuda_extern.so -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/custom_ops" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/custom_ops" -l:"gen_ops_cudnn_rnn_backward_x_cudnn_conv_cudnntesthashddba11.cpython-39-x86_64-linux-gnu".so -o "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3dTx_float32__Ty_float32Tw_float32__JIT_1JIT_cuda_1index_t_int32_hash_f7dc3a0a93f44f4e_op.so"

return 512. This might be an overcommit issue or out of memory. Try : sudo sysctl vm.overcommit_memory=1, or set enviroment variable export DISABLE_MULTIPROCESSING=1


Async error was detected. To locate the async backtrace and get better error report, please rerun your code with two enviroment variables set:

export JT_SYNC=1 export trace_py_var=3

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 150, in test_conv3d check((2,4,10,10,10), (5,4,3,3,3), (1,1,1), (1,1,1)) File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 142, in check jt.sync_all() File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/init.py", line 160, in exit setattr(flags, k, v) RuntimeError: [f 0616 09:51:30.378681 84 executor.cc:686] Execute fused operator(0/5) failed.

[Input]: float32[2,4,10,10,10,], float32[5,4,3,3,3,],

[Async Backtrace]: not found, please set env JT_SYNC=1, trace_py_var=3 [Reason]: [f 0616 09:51:30.378391 84 log.cc:605] Check failed: ret>=0 && ret<=256 Run cmd failed: "/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/bin/nvcc" "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3dTx_float32__Ty_float32Tw_float32JIT_1JIT_cuda_1index_t_int32_hash_f7dc3a0a93f44f4e_op.cc" -std=c++14 -Xcompiler -fPIC -Xcompiler -march=native -Xcompiler -fdiagnostics-color=always -lstdc++ -ldl -shared -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/src" -I/home2/ykm2023/miniconda3/envs/dl/include/python3.9 -I/home2/ykm2023/miniconda3/envs/dl/include/python3.9 -DHAS_CUDA -DIS_CUDA -I"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/include" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc" -lcudart -L"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -I"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default" -l:"jit_utils_core.cpython-39-x86_64-linux-gnu".so -l:"jittor_core.cpython-39-x86_64-linux-gnu".so -x cu --cudart=shared -ccbin="/usr/bin/g++" --use_fast_math -w -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc" -arch=compute_86 -code=sm_86 -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/inc" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/ops" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/inc" -I"/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/extern/cuda/cudnn/inc" -lcudnn -L"/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jtcuda/cuda12.2_cudnn8_linux/lib64" -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/cuda" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/cuda" -l:libcuda_extern.so -L"/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/custom_ops" -Xlinker -rpath="/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/custom_ops" -l:"gen_ops_cudnn_rnn_backward_x_cudnn_conv_cudnntesthashddba11.cpython-39-x86_64-linux-gnu".so -o "/home2/ykm2023/.cache/jittor/jt1.3.9/g++11.4.0/py3.9.19/Linux-5.15.0-1x40/IntelRCoreTMi9xc2/f681/default/cu12.2.140_sm_86/jit/cudnn_conv3dTx_float32__Ty_float32Tw_float32__JIT_1JIT_cuda_1index_t_int32_hash_f7dc3a0a93f44f4e_op.so"

return 512. This might be an overcommit issue or out of memory. Try : sudo sysctl vm.overcommit_memory=1, or set enviroment variable export DISABLE_MULTIPROCESSING=1


Async error was detected. To locate the async backtrace and get better error report, please rerun your code with two enviroment variables set:

export JT_SYNC=1 export trace_py_var=3

====================================================================== ERROR: test_conv_transpose3d (main.TestCudnnConvOp)

Traceback (most recent call last): File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 184, in test_conv_transpose3d check((2,5,10,10,10), (5,4,3,3,3), (1,1,1), (1,1,1)) File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/test/test_cudnn_op.py", line 168, in check y2 = jt.nn.conv_transpose3d(x, w, None, stride, padding, 0, group, dilation) File "/home2/ykm2023/miniconda3/envs/dl/lib/python3.9/site-packages/jittor/nn.py", line 1611, in conv_transpose3d if stride <= 0: TypeError: '<=' not supported between instances of 'tuple' and 'int'


Ran 5 tests in 2.355s

FAILED (errors=2)

Exusial commented 2 months ago

可以尝试把gcc版本降到9