flaport / torch_sparse_solve

A sparse KLU solver for PyTorch.
https://pypi.org/project/torch-sparse-solve
GNU Lesser General Public License v2.1
60 stars 4 forks source link

Wired bugs: Process finished with exit code 139 (interrupted by signal 11: SIGSEGV) #10

Closed HONGJINLYU closed 3 years ago

HONGJINLYU commented 3 years ago

Hi Flaport:

Thanks for your amazing code!

However, when I applied your code, there is a weired error: '''Process finished with exit code 139 (interrupted by signal 11: SIGSEGV).''' The code is instantly stopped by the above bugs. And I have no idea what should I do.

To Reproduce

The following code can reproduce the behavior:

import torch
from torch_sparse_solve import solve
import random

def random_int_list(start, stop, length):
    start, stop = (int(start), int(stop)) if start <= stop else (int(stop), int(start))
    length = int(abs(length)) if length else 0
    random_list = []
    for i in range(length):
        random_list.append(random.randint(start, stop))
    return random_list

'''
create b
'''
b = torch.randn(1024,1,requires_grad=True,dtype=torch.float64).cuda()

'''
create A
'''
L_width = 1024
nnz = random.randint(300, 800) # number of nonzero elements
val = [random.uniform(-2, 2) for i in range(nnz)]
row = sorted(random_int_list(0,L_width-1,nnz))
col = sorted(random_int_list(0,L_width-1,nnz))

A = torch.sparse_coo_tensor([row, col], val, (L_width, L_width), dtype=torch.float64, requires_grad=True).cuda()

result_oc = solve(A.unsqueeze(0), b.unsqueeze(0))

Expected behavior

Can properly apply your code: solve(A, b)

Environment

Collecting environment information...
PyTorch version: 1.7.1
Is debug build: False
CUDA used to build PyTorch: 9.2
ROCM used to build PyTorch: N/A

OS: Ubuntu 18.04.2 LTS (x86_64)
GCC version: (Ubuntu 7.5.0-3ubuntu1~18.04) 7.5.0
Clang version: Could not collect
CMake version: version 3.10.2
Libc version: glibc-2.9

Python version: 3.6.12 |Anaconda, Inc.| (default, Sep  8 2020, 23:10:56)  [GCC 7.3.0] (64-bit runtime)
Python platform: Linux-4.15.0-151-generic-x86_64-with-debian-buster-sid
Is CUDA available: True
CUDA runtime version: 9.2.148
GPU models and configuration: GPU 0: GeForce GTX 1080 Ti
Nvidia driver version: 396.54
cuDNN version: Probably one of the following:
/usr/lib/x86_64-linux-gnu/libcudnn.so.7.4.1
/usr/local/cuda-9.2/lib64/libcudnn.so.7
HIP runtime version: N/A
MIOpen runtime version: N/A

Versions of relevant libraries:
[pip3] numpy==1.19.2
[pip3] torch==1.7.1
[pip3] torch-sparse-solve==0.0.4
[pip3] torchaudio==0.7.0a0+a853dff
[pip3] torchvision==0.8.2
[conda] blas                      1.0                         mkl
[conda] cudatoolkit               9.2                           0
[conda] mkl                       2020.2                      256
[conda] mkl-service               2.3.0            py36he8ac12f_0
[conda] mkl_fft                   1.2.1            py36h54f3939_0
[conda] mkl_random                1.1.1            py36h0573a6f_0
[conda] numpy                     1.19.2           py36h54aff64_0
[conda] numpy-base                1.19.2           py36hfa32c7d_0
[conda] pytorch                   1.7.1           py3.6_cuda9.2.148_cudnn7.6.3_0    pytorch
[conda] torch-sparse-solve        0.0.4                     <pip>
[conda] torchaudio                0.7.2                      py36    pytorch
[conda] torchvision               0.8.2                 py36_cu92    pytorch

Additional context

useful link: https://stackoverflow.com/questions/49414841/process-finished-with-exit-code-139-interrupted-by-signal-11-sigsegv

Really looking forward for your reply

Best Regards

flaport commented 3 years ago

Hey @HONGJINLYU ,

I think these segfaults are related to the fact that you're using GPU tensors (cuda). Unfortunately torch_sparse_solve only supports CPU tensors.

I hope this helps.

HONGJINLYU commented 3 years ago

Hi flaport: thanks for your kind reply. It works!

Best Regards HONGJIN