Open HuaMuLanChina opened 4 years ago
运行python -m jittor.test.test_example后最后报异常了。
step 993, loss = 0.0010326019255444407 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 994, loss = 0.0009665400721132755 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 995, loss = 0.0014802009100094438 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 996, loss = 0.0016367514617741108 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 997, loss = 0.0011712713167071342 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 998, loss = 0.0010918093612417579 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
step 999, loss = 0.0009948197985067964 {'hold_vars': 14, 'lived_vars': 67, 'lived_ops': 59}
.
----------------------------------------------------------------------
Ran 1 test in 1.962s
OK
Caught segfault at address 0x65, flush log...
[bt] Execution path:
[bt] #1 /root/.cache/jittor/default/g++/jit_utils_core.cpython-37m-x86_64-linux-gnu.so(_ZN6jittor18segfault_sigactionEiP9siginfo_tPv+0x465) [0x7faa730e54f5]
?? ??:0
[bt] #2 /lib/x86_64-linux-gnu/libc.so.6(+0x3efd0) [0x7faa747d7fd0]
?? ??:0
[bt] #3 /usr/lib/x86_64-linux-gnu/libcublas.so(+0x2a319) [0x7faa1bd74319]
?? ??:0
[bt] #4 /usr/lib/x86_64-linux-gnu/libcublas.so(+0x2ae36) [0x7faa1bd74e36]
?? ??:0
[bt] #5 /usr/lib/x86_64-linux-gnu/libcublas.so(cublasDestroy_v2+0xf7) [0x7faa1be02f87]
?? ??:0
[bt] #6 /root/.cache/jittor/default/g++/custom_ops/gen_ops_cublas_batched_matmul_cublas_test_cublas_matmul.cpython-37m-x86_64-linux-gnu.so(+0xe073) [0x7faa27df6073]
?? ??:0
[bt] #7 /root/.cache/jittor/default/g++/custom_ops/gen_ops_cublas_batched_matmul_cublas_test_cublas_matmul.cpython-37m-x86_64-linux-gnu.so(_ZN6jittor13cublas_initerD2Ev+0x33) [0x7faa27df64d3]
?? ??:0
[bt] #8 /lib/x86_64-linux-gnu/libc.so.6(+0x430f1) [0x7faa747dc0f1]
?? ??:0
[bt] #9 /lib/x86_64-linux-gnu/libc.so.6(+0x431ea) [0x7faa747dc1ea]
?? ??:0
[bt] #10 python(+0x224989) [0x5607f1179989]
/usr/bin/addr2line: 'python': No such file
[bt] #11 python(+0x224a37) [0x5607f1179a37]
/usr/bin/addr2line: 'python': No such file
[bt] #12 python(PyErr_PrintEx+0x32) [0x5607f1179ad2]
/usr/bin/addr2line: 'python': No such file
[bt] #13 python(+0x224cf5) [0x5607f1179cf5]
/usr/bin/addr2line: 'python': No such file
[bt] #14 python(+0x23760b) [0x5607f118c60b]
/usr/bin/addr2line: 'python': No such file
[bt] #15 python(_Py_UnixMain+0x3c) [0x5607f118c6fc]
/usr/bin/addr2line: 'python': No such file
Segfault, exit
您好,感谢您的反馈,抱歉回复的不及时,这个问题应该在最新的master已经修复
如果您仍然碰到了问题欢迎在这个issue继续回复
(jittor) root@com:~/sunxu# python
Python 3.7.7 (default, May 7 2020, 21:25:33)
[GCC 7.3.0] :: Anaconda, Inc. on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import jittor as jt
[i 0729 09:15:45.657191 20 __init__.py:211] Found g++(7.5.0) at /usr/bin/g++.
[i 0729 09:15:45.687096 28 __init__.py:211] Found /usr/local/cuda/bin/nvcc(10.1.243) at /usr/local/cuda/bin/nvcc.
[i 0729 09:15:45.702007 28 __init__.py:211] Found addr2line(2.30) at /usr/bin/addr2line.
[i 0729 09:15:45.735945 28 compiler.py:862] pybind_include: -I/root/miniconda3/envs/jittor/include/python3.7m -I/root/miniconda3/envs/jittor/lib/python3.7/site-packages/pybind11/include
[i 0729 09:15:45.756010 28 compiler.py:864] extension_suffix: .cpython-37m-x86_64-linux-gnu.so
[i 0729 09:15:45.999778 28 __init__.py:140] Total mem: 125.78GB, using 8 procs for compiling.
[i 0729 09:15:46.190028 28 jit_compiler.cc:20] Load cc_path: /usr/bin/g++
[i 0729 09:15:46.393583 28 cuda_flags.cc:25] CUDA disabled.
[i 0729 09:15:46.395806 28 init.cc:50] Found cuda archs: [61,]
[i 0729 09:15:46.502072 28 compile_extern.py:350] mpicc not found, distribution disabled.
[i 0729 09:15:48.465357 28 compile_extern.py:15] found /usr/include/cublas.h
[i 0729 09:15:48.465493 28 compile_extern.py:15] found /usr/lib/x86_64-linux-gnu/libcublas.so
[i 0729 09:15:50.037613 28 compile_extern.py:15] found /usr/include/cudnn.h
[i 0729 09:15:50.037697 28 compile_extern.py:15] found /usr/lib/x86_64-linux-gnu/libcudnn.so
[i 0729 09:15:50.048070 28 compiler.py:628] handle pyjt_include /root/miniconda3/envs/jittor/lib/python3.7/site-packages/jittor/extern/cuda/cudnn/inc/cudnn_warper.h
[i 0729 09:15:50.718433 28 compile_extern.py:15] found /usr/local/cuda/include/curand.h
[i 0729 09:15:50.718508 28 compile_extern.py:15] found /usr/local/cuda/lib64/libcurand.so
>>> from jittor import nn
>>> fc = nn.Linear(5,8)
>>> i = jt.random([3,5])
>>> p = fc(i)
>>> p.shape
[3,8,]
>>> jt.flags.use_cuda = 1
[i 0729 09:17:05.830522 28 cuda_flags.cc:23] CUDA enabled.
>>> p.shape
[3,8,]
>>> p = fc(i)
>>> p.shape
[3,8,]
>>> exit()
Caught segfault at address 0x65, flush log...
[bt] Execution path:
[bt] #1 /root/.cache/jittor/default/g++/jit_utils_core.cpython-37m-x86_64-linux-gnu.so(_ZN6jittor18segfault_sigactionEiP9siginfo_tPv+0x465) [0x7ff00b90b4f5]
?? ??:0
[bt] #2 /lib/x86_64-linux-gnu/libc.so.6(+0x3efd0) [0x7ff00d059fd0]
?? ??:0
[bt] #3 /usr/lib/x86_64-linux-gnu/libcublas.so(+0x2a319) [0x7fefb7d74319]
?? ??:0
[bt] #4 /usr/lib/x86_64-linux-gnu/libcublas.so(+0x2ae36) [0x7fefb7d74e36]
?? ??:0
[bt] #5 /usr/lib/x86_64-linux-gnu/libcublas.so(cublasDestroy_v2+0xf7) [0x7fefb7e02f87]
?? ??:0
[bt] #6 /root/.cache/jittor/default/g++/custom_ops/gen_ops_cublas_matmul_cublas_batched_matmul_cublas_test.cpython-37m-x86_64-linux-gnu.so(+0xed23) [0x7fefc8034d23]
?? ??:0
[bt] #7 /root/.cache/jittor/default/g++/custom_ops/gen_ops_cublas_matmul_cublas_batched_matmul_cublas_test.cpython-37m-x86_64-linux-gnu.so(_ZN6jittor13cublas_initerD2Ev+0x33) [0x7fefc8035183]
?? ??:0
[bt] #8 /lib/x86_64-linux-gnu/libc.so.6(+0x430f1) [0x7ff00d05e0f1]
?? ??:0
[bt] #9 /lib/x86_64-linux-gnu/libc.so.6(+0x431ea) [0x7ff00d05e1ea]
?? ??:0
[bt] #10 python(+0x224989) [0x562a4d718989]
/usr/bin/addr2line: 'python': No such file
[bt] #11 python(+0x224a37) [0x562a4d718a37]
/usr/bin/addr2line: 'python': No such file
[bt] #12 python(PyErr_PrintEx+0x32) [0x562a4d718ad2]
/usr/bin/addr2line: 'python': No such file
[bt] #13 python(PyRun_InteractiveLoopFlags+0x131) [0x562a4d5e9484]
/usr/bin/addr2line: 'python': No such file
[bt] #14 python(+0xf54e6) [0x562a4d5e94e6]
/usr/bin/addr2line: 'python': No such file
[bt] #15 python(+0xf5f83) [0x562a4d5e9f83]
/usr/bin/addr2line: 'python': No such file
Segfault, exit
(jittor) root@com:~/sunxu#
fc可以运行了。 最后的时候还是有这些报错。jittor.test.test_cuda 和 jittor.test.test_example 也是。
四卡的linux lxc 容器
单卡的