Jittor / jittor

Jittor is a high-performance deep learning framework based on JIT compiling and meta-operators.
https://cg.cs.tsinghua.edu.cn/jittor/
Apache License 2.0
3.08k stars 311 forks source link

好像是Bug #114

Open HuaMuLanChina opened 4 years ago

HuaMuLanChina commented 4 years ago

四卡的linux lxc 容器

Python 3.7.7 (default, May  7 2020, 21:25:33) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jittor as jt
[i 0720 09:23:19.773135 24 __init__.py:211] Found g++(7.5.0) at /usr/bin/g++.
[i 0720 09:23:19.803045 32 __init__.py:211] Found /usr/local/cuda/bin/nvcc(10.1.243) at /usr/local/cuda/bin/nvcc.
[i 0720 09:23:19.818007 32 __init__.py:211] Found addr2line(2.30) at /usr/bin/addr2line.
[i 0720 09:23:19.850424 32 compiler.py:862] pybind_include: -I/root/miniconda3/envs/jittor/include/python3.7m -I/root/miniconda3/envs/jittor/lib/python3.7/site-packages/pybind11/include
[i 0720 09:23:19.869493 32 compiler.py:864] extension_suffix: .cpython-37m-x86_64-linux-gnu.so
[i 0720 09:23:20.113680 32 __init__.py:140] Total mem: 125.78GB, using 8 procs for compiling.
[i 0720 09:23:20.472107 32 jit_compiler.cc:20] Load cc_path: /usr/bin/g++
[i 0720 09:23:20.472658 32 cuda_flags.cc:25] CUDA disabled.
[i 0720 09:23:20.474486 32 init.cc:50] Found cuda archs: [61,]
[i 0720 09:23:20.626632 32 compile_extern.py:350] mpicc not found, distribution disabled.
[i 0720 09:23:22.562227 32 compile_extern.py:15] found /usr/include/cublas.h
[i 0720 09:23:22.562375 32 compile_extern.py:15] found /usr/lib/x86_64-linux-gnu/libcublas.so
[i 0720 09:23:24.110812 32 compile_extern.py:15] found /usr/include/cudnn.h
[i 0720 09:23:24.110891 32 compile_extern.py:15] found /usr/lib/x86_64-linux-gnu/libcudnn.so
[i 0720 09:23:24.120942 32 compiler.py:628] handle pyjt_include /root/miniconda3/envs/jittor/lib/python3.7/site-packages/jittor/extern/cuda/cudnn/inc/cudnn_warper.h
[i 0720 09:23:26.472446 32 compile_extern.py:15] found /usr/local/cuda/include/curand.h
[i 0720 09:23:26.472519 32 compile_extern.py:15] found /usr/local/cuda/lib64/libcurand.so
>>> from jittor import nn
>>> fc = nn.Linear(5,8)
>>> i = jt.random([3,5])
>>> p = fc(i)
>>> p.shape
[3,8,]
>>> jt.flags.use_cuda=1
[i 0720 09:25:13.794089 32 cuda_flags.cc:23] CUDA enabled.
>>> p = fc(i)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/root/miniconda3/envs/jittor/lib/python3.7/site-packages/jittor/__init__.py", line 409, in __call__
    return self.execute(*args, **kw)
  File "/root/miniconda3/envs/jittor/lib/python3.7/site-packages/jittor/nn.py", line 192, in execute
    x = matmul_transpose(x, self.weight)
  File "/root/miniconda3/envs/jittor/lib/python3.7/site-packages/jittor/nn.py", line 27, in matmul_transpose
    jt.compile_extern.cublas_ops.cublas_batched_matmul(a, b, 0, 0)
RuntimeError: Wrong inputs arguments, Please refer to examples(help(jt.cublas_batched_matmul)).

Types of your inputs are:
 self   = module,
 args   = (Var, Var, int, int, ),

The function declarations are:
 VarHolder* cublas_batched_matmul(VarHolder* a, VarHolder* b,  bool trans_a,  bool trans_b)

Failed reason:[f 0720 09:25:17.416915 32 cublas_batched_matmul_op.cc:51] Check failed a->shape.size()(2) == 3(3) Something wrong ... Could you please report this issue?

>>> 
(jittor) root@com:~/sunxu/jtest# uname -a
Linux com 4.15.0-111-generic #112-Ubuntu SMP Thu Jul 9 20:32:34 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

单卡的

Python 3.7.7 (default, May  7 2020, 21:25:33) 
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jittor as jt
[i 0720 17:32:21.728296 80 __init__.py:211] Found g++(7.5.0) at /usr/bin/g++.
[i 0720 17:32:21.753104 12 compiler.py:773] Found /usr/local/cuda/bin/nvcc(10.0.130) at /usr/local/cuda-10.0/bin/nvcc
[i 0720 17:32:21.812363 12 __init__.py:211] Found gdb(8.1.0) at /usr/bin/gdb.
[i 0720 17:32:21.825690 12 __init__.py:211] Found addr2line(2.30) at /usr/bin/addr2line.
[i 0720 17:32:22.137452 12 compiler.py:862] pybind_include: -I/home/pe73hua/anaconda3/envs/jittor/include/python3.7m -I/home/pe73hua/anaconda3/envs/jittor/lib/python3.7/site-packages/pybind11/include
[i 0720 17:32:22.173526 12 compiler.py:864] extension_suffix: .cpython-37m-x86_64-linux-gnu.so
[i 0720 17:32:22.357286 12 __init__.py:140] Total mem: 31.26GB, using 8 procs for compiling.
[i 0720 17:32:22.618868 12 cuda_flags.cc:25] CUDA disabled.
[i 0720 17:32:22.619200 12 jit_compiler.cc:20] Load cc_path: /usr/bin/g++
[i 0720 17:32:22.619215 12 jit_compiler.cc:23] Load nvcc_path: /usr/local/cuda-10.0/bin/nvcc
[i 0720 17:32:22.619404 12 init.cc:50] Found cuda archs: [61,]
[i 0720 17:32:22.711473 12 __init__.py:211] Found mpicc(2.1.1) at /usr/bin/mpicc.
[i 0720 17:32:22.761829 12 compiler.py:628] handle pyjt_include /home/pe73hua/anaconda3/envs/jittor/lib/python3.7/site-packages/jittor/extern/mpi/inc/mpi_warper.h
[i 0720 17:32:22.923333 12 compile_extern.py:256] Downloading nccl...
Data file has been downloaded and verified
[i 0720 17:32:23.554343 12 compile_extern.py:15] found /usr/local/cuda-10.0/include/cublas.h
[i 0720 17:32:23.554461 12 compile_extern.py:15] found /usr/local/cuda-10.0/lib64/libcublas.so
[i 0720 17:32:25.193808 12 compile_extern.py:15] found /usr/local/cuda-10.0/include/cudnn.h
[i 0720 17:32:25.193876 12 compile_extern.py:15] found /usr/local/cuda-10.0/lib64/libcudnn.so
[i 0720 17:32:25.202628 12 compiler.py:628] handle pyjt_include /home/pe73hua/anaconda3/envs/jittor/lib/python3.7/site-packages/jittor/extern/cuda/cudnn/inc/cudnn_warper.h
[i 0720 17:32:25.812579 12 compile_extern.py:15] found /usr/local/cuda-10.0/include/curand.h
[i 0720 17:32:25.812647 12 compile_extern.py:15] found /usr/local/cuda-10.0/lib64/libcurand.so
>>> from jittor import nn
>>> fc = nn.Linear(5,8)
>>> i = jt.random([3,5])
>>> p = fc(i)
>>> p.shape
[3,8,]
>>> jt.flags.use_cuda=1
>>> [i 0720 17:33:20.760420 12 cuda_flags.cc:23] CUDA enabled.
;
  File "<stdin>", line 1
    ;
    ^
SyntaxError: invalid syntax
>>> p = fc(i)
>>> p.shape
[3,8,]
>>> 
(jittor) pe73hua@pe73hua-Find:~/d/jtest$ uname
Linux
(jittor) pe73hua@pe73hua-Find:~/d/jtest$ uname -a
Linux pe73hua-Find 5.3.0-62-generic #56~18.04.1-Ubuntu SMP Wed Jun 24 16:17:03 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
HuaMuLanChina commented 4 years ago

运行python -m jittor.test.test_example后最后报异常了。

step 993, loss = 0.0010326019255444407 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 994, loss = 0.0009665400721132755 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 995, loss = 0.0014802009100094438 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 996, loss = 0.0016367514617741108 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 997, loss = 0.0011712713167071342 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 998, loss = 0.0010918093612417579 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 999, loss = 0.0009948197985067964 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
.
----------------------------------------------------------------------
Ran 1 test in 1.962s

OK
Caught segfault at address 0x65, flush log...
[bt] Execution path:
[bt] #1 /root/.cache/jittor/default/g++/jit_utils_core.cpython-37m-x86_64-linux-gnu.so(_ZN6jittor18segfault_sigactionEiP9siginfo_tPv+0x465) [0x7faa730e54f5]
?? ??:0
[bt] #2 /lib/x86_64-linux-gnu/libc.so.6(+0x3efd0) [0x7faa747d7fd0]
?? ??:0
[bt] #3 /usr/lib/x86_64-linux-gnu/libcublas.so(+0x2a319) [0x7faa1bd74319]
?? ??:0
[bt] #4 /usr/lib/x86_64-linux-gnu/libcublas.so(+0x2ae36) [0x7faa1bd74e36]
?? ??:0
[bt] #5 /usr/lib/x86_64-linux-gnu/libcublas.so(cublasDestroy_v2+0xf7) [0x7faa1be02f87]
?? ??:0
[bt] #6 /root/.cache/jittor/default/g++/custom_ops/gen_ops_cublas_batched_matmul_cublas_test_cublas_matmul.cpython-37m-x86_64-linux-gnu.so(+0xe073) [0x7faa27df6073]
?? ??:0
[bt] #7 /root/.cache/jittor/default/g++/custom_ops/gen_ops_cublas_batched_matmul_cublas_test_cublas_matmul.cpython-37m-x86_64-linux-gnu.so(_ZN6jittor13cublas_initerD2Ev+0x33) [0x7faa27df64d3]
?? ??:0
[bt] #8 /lib/x86_64-linux-gnu/libc.so.6(+0x430f1) [0x7faa747dc0f1]
?? ??:0
[bt] #9 /lib/x86_64-linux-gnu/libc.so.6(+0x431ea) [0x7faa747dc1ea]
?? ??:0
[bt] #10 python(+0x224989) [0x5607f1179989]
/usr/bin/addr2line: 'python': No such file
[bt] #11 python(+0x224a37) [0x5607f1179a37]
/usr/bin/addr2line: 'python': No such file
[bt] #12 python(PyErr_PrintEx+0x32) [0x5607f1179ad2]
/usr/bin/addr2line: 'python': No such file
[bt] #13 python(+0x224cf5) [0x5607f1179cf5]
/usr/bin/addr2line: 'python': No such file
[bt] #14 python(+0x23760b) [0x5607f118c60b]
/usr/bin/addr2line: 'python': No such file
[bt] #15 python(_Py_UnixMain+0x3c) [0x5607f118c6fc]
/usr/bin/addr2line: 'python': No such file
Segfault, exit
cjld commented 4 years ago

您好,感谢您的反馈,抱歉回复的不及时,这个问题应该在最新的master已经修复

cjld commented 4 years ago

如果您仍然碰到了问题欢迎在这个issue继续回复

HuaMuLanChina commented 4 years ago
(jittor) root@com:~/sunxu# python
Python 3.7.7 (default, May  7 2020, 21:25:33)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jittor as jt
[i 0729 09:15:45.657191 20 __init__.py:211] Found g++(7.5.0) at /usr/bin/g++.
[i 0729 09:15:45.687096 28 __init__.py:211] Found /usr/local/cuda/bin/nvcc(10.1.243) at /usr/local/cuda/bin/nvcc.
[i 0729 09:15:45.702007 28 __init__.py:211] Found addr2line(2.30) at /usr/bin/addr2line.
[i 0729 09:15:45.735945 28 compiler.py:862] pybind_include: -I/root/miniconda3/envs/jittor/include/python3.7m -I/root/miniconda3/envs/jittor/lib/python3.7/site-packages/pybind11/include
[i 0729 09:15:45.756010 28 compiler.py:864] extension_suffix: .cpython-37m-x86_64-linux-gnu.so
[i 0729 09:15:45.999778 28 __init__.py:140] Total mem: 125.78GB, using 8 procs for compiling.
[i 0729 09:15:46.190028 28 jit_compiler.cc:20] Load cc_path: /usr/bin/g++
[i 0729 09:15:46.393583 28 cuda_flags.cc:25] CUDA disabled.
[i 0729 09:15:46.395806 28 init.cc:50] Found cuda archs: [61,]
[i 0729 09:15:46.502072 28 compile_extern.py:350] mpicc not found, distribution disabled.
[i 0729 09:15:48.465357 28 compile_extern.py:15] found /usr/include/cublas.h
[i 0729 09:15:48.465493 28 compile_extern.py:15] found /usr/lib/x86_64-linux-gnu/libcublas.so
[i 0729 09:15:50.037613 28 compile_extern.py:15] found /usr/include/cudnn.h
[i 0729 09:15:50.037697 28 compile_extern.py:15] found /usr/lib/x86_64-linux-gnu/libcudnn.so
[i 0729 09:15:50.048070 28 compiler.py:628] handle pyjt_include /root/miniconda3/envs/jittor/lib/python3.7/site-packages/jittor/extern/cuda/cudnn/inc/cudnn_warper.h
[i 0729 09:15:50.718433 28 compile_extern.py:15] found /usr/local/cuda/include/curand.h
[i 0729 09:15:50.718508 28 compile_extern.py:15] found /usr/local/cuda/lib64/libcurand.so
>>> from jittor import nn
>>> fc = nn.Linear(5,8)
>>> i = jt.random([3,5])
>>> p = fc(i)
>>> p.shape
[3,8,]
>>> jt.flags.use_cuda = 1
[i 0729 09:17:05.830522 28 cuda_flags.cc:23] CUDA enabled.
>>> p.shape
[3,8,]
>>> p = fc(i)
>>> p.shape
[3,8,]
>>> exit()
Caught segfault at address 0x65, flush log...
[bt] Execution path:
[bt] #1 /root/.cache/jittor/default/g++/jit_utils_core.cpython-37m-x86_64-linux-gnu.so(_ZN6jittor18segfault_sigactionEiP9siginfo_tPv+0x465) [0x7ff00b90b4f5]
?? ??:0
[bt] #2 /lib/x86_64-linux-gnu/libc.so.6(+0x3efd0) [0x7ff00d059fd0]
?? ??:0
[bt] #3 /usr/lib/x86_64-linux-gnu/libcublas.so(+0x2a319) [0x7fefb7d74319]
?? ??:0
[bt] #4 /usr/lib/x86_64-linux-gnu/libcublas.so(+0x2ae36) [0x7fefb7d74e36]
?? ??:0
[bt] #5 /usr/lib/x86_64-linux-gnu/libcublas.so(cublasDestroy_v2+0xf7) [0x7fefb7e02f87]
?? ??:0
[bt] #6 /root/.cache/jittor/default/g++/custom_ops/gen_ops_cublas_matmul_cublas_batched_matmul_cublas_test.cpython-37m-x86_64-linux-gnu.so(+0xed23) [0x7fefc8034d23]
?? ??:0
[bt] #7 /root/.cache/jittor/default/g++/custom_ops/gen_ops_cublas_matmul_cublas_batched_matmul_cublas_test.cpython-37m-x86_64-linux-gnu.so(_ZN6jittor13cublas_initerD2Ev+0x33) [0x7fefc8035183]
?? ??:0
[bt] #8 /lib/x86_64-linux-gnu/libc.so.6(+0x430f1) [0x7ff00d05e0f1]
?? ??:0
[bt] #9 /lib/x86_64-linux-gnu/libc.so.6(+0x431ea) [0x7ff00d05e1ea]
?? ??:0
[bt] #10 python(+0x224989) [0x562a4d718989]
/usr/bin/addr2line: 'python': No such file
[bt] #11 python(+0x224a37) [0x562a4d718a37]
/usr/bin/addr2line: 'python': No such file
[bt] #12 python(PyErr_PrintEx+0x32) [0x562a4d718ad2]
/usr/bin/addr2line: 'python': No such file
[bt] #13 python(PyRun_InteractiveLoopFlags+0x131) [0x562a4d5e9484]
/usr/bin/addr2line: 'python': No such file
[bt] #14 python(+0xf54e6) [0x562a4d5e94e6]
/usr/bin/addr2line: 'python': No such file
[bt] #15 python(+0xf5f83) [0x562a4d5e9f83]
/usr/bin/addr2line: 'python': No such file
Segfault, exit
(jittor) root@com:~/sunxu#

fc可以运行了。 最后的时候还是有这些报错。jittor.test.test_cuda 和 jittor.test.test_example 也是。