Oloren-AI / olorenchemengine

OCE is the first infinitely composable library for reproducibly implementing SOTA molecular property prediction/QSAR techniques.
MIT License
98 stars 14 forks source link

OSError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory #48

Closed PatWalters closed 2 years ago

PatWalters commented 2 years ago

I installed on Ubuntu 18.04.6 LTS

conda create -n oce python=3.8 conda activate oce bash <(curl -s https://raw.githubusercontent.com/Oloren-AI/olorenchemengine/master/install.sh)

When running 0_Minimal_Example.ipynb, I get this error OSError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory


OSError Traceback (most recent call last) Cell In [1], line 1 ----> 1 import olorenchemengine as oce 2 import pandas as pd 4 df = pd.read_csv("https://storage.googleapis.com/oloren-public-data/CHEMBL%20Datasets/997_2298%20-%20VEGFR1%20(CHEMBL1868).csv")

File ~/anaconda3/envs/oce/lib/python3.8/site-packages/olorenchemengine/init.py:70 68 for imp in _OPTIONAL_IMPORTS_FOR_OCE_ONLINE: 69 try: ---> 70 import(imp) 71 except ImportError: 72 import sys

File ~/anaconda3/envs/oce/lib/python3.8/site-packages/torch_geometric/init.py:4 1 from types import ModuleType 2 from importlib import import_module ----> 4 import torch_geometric.data 5 import torch_geometric.loader 6 import torch_geometric.transforms

File ~/anaconda3/envs/oce/lib/python3.8/site-packages/torch_geometric/data/init.py:1 ----> 1 from .data import Data 2 from .hetero_data import HeteroData 3 from .temporal import TemporalData

File ~/anaconda3/envs/oce/lib/python3.8/site-packages/torch_geometric/data/data.py:20 18 import torch 19 from torch import Tensor ---> 20 from torch_sparse import SparseTensor 22 from torch_geometric.data.feature_store import ( 23 FeatureStore, 24 FeatureTensorType, 25 TensorAttr, 26 _field_status, 27 ) 28 from torch_geometric.data.graph_store import ( 29 EDGE_LAYOUT_TO_ATTR_NAME, 30 EdgeAttr, (...) 34 edge_tensor_type_to_adj_type, 35 )

File ~/anaconda3/envs/oce/lib/python3.8/site-packages/torch_sparse/init.py:19 17 spec = cuda_spec or cpu_spec 18 if spec is not None: ---> 19 torch.ops.load_library(spec.origin) 20 else: # pragma: no cover 21 raise ImportError(f"Could not find module '{library}_cpu' in " 22 f"{osp.dirname(file)}")

File ~/anaconda3/envs/oce/lib/python3.8/site-packages/torch/_ops.py:220, in _Ops.load_library(self, path) 215 path = torch._utils_internal.resolve_library_path(path) 216 with dl_open_guard(): 217 # Import the shared library into the process, thus running its 218 # static (global) initialization code in order to register custom 219 # operators with the JIT. --> 220 ctypes.CDLL(path) 221 self.loaded_libraries.add(path)

File ~/anaconda3/envs/oce/lib/python3.8/ctypes/init.py:373, in CDLL.init(self, name, mode, handle, use_errno, use_last_error, winmode) 370 self._FuncPtr = _FuncPtr 372 if handle is None: --> 373 self._handle = _dlopen(self._name, mode) 374 else: 375 self._handle = handle

OSError: libtorch_cuda_cu.so: cannot open shared object file: No such file or directory

davidzqhuang commented 2 years ago

Thanks for opening the issue, appreciate your patience with the install process, unfortunately some of this is out of our hands with PyTorch Geometric.

We've found some success in this kind of issue with just updating system packages:

apt-get clean && apt-get update -y -qq
apt-get install -y curl git build-essential
echo “deb http://us.archive.ubuntu.com/ubuntu/ xenial main universe” >> /etc/apt/sources.list
echo “deb-src http://us.archive.ubuntu.com/ubuntu/ xenial main universe” >> /etc/apt/sources.list
apt-get update
apt-get install -y libsm6 libxext6 libxrender-dev

This also seems to be a known issue on the PyTorch Geometric end, from their FAQ: image https://github.com/pyg-team/pytorch_geometric/issues/43

Please keep us updated, we will also be spinning up a server to try to replicate this.

The last option could be to use Docker

FROM nvidia/cuda:11.3.1-devel

# NVIDIA Key Swaps
RUN rm /etc/apt/sources.list.d/cuda.list
RUN rm /etc/apt/sources.list.d/nvidia-ml.list
RUN apt-key del 7fa2af80
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/cuda/repos/ubuntu1804/x86_64/3bf863cc.pub
RUN apt-key adv --fetch-keys https://developer.download.nvidia.com/compute/machine-learning/repos/ubuntu1804/x86_64/7fa2af80.pub

## Basic dependencies
RUN apt-get clean && apt-get update -y -qq
RUN apt-get install -y curl git build-essential
RUN echo "deb http://us.archive.ubuntu.com/ubuntu/ xenial main universe" >> /etc/apt/sources.list
RUN echo "deb-src http://us.archive.ubuntu.com/ubuntu/ xenial main universe" >> /etc/apt/sources.list
RUN apt-get update
RUN ["apt-get", "install", "-y", "libsm6", "libxext6", "libxrender-dev"]

# Install Anaconda3
ENV PATH="/root/miniconda3/bin:${PATH}"
ARG PATH="/root/miniconda3/bin:${PATH}"
RUN apt-get install -y wget && rm -rf /var/lib/apt/lists/*
RUN wget \
    https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh \
    && mkdir /root/.conda \
    && bash Miniconda3-latest-Linux-x86_64.sh -b \
    && rm -f Miniconda3-latest-Linux-x86_64.sh

# Set the conda version!
RUN conda --version
RUN conda install python=3.8

RUN curl https://raw.githubusercontent.com/Oloren-AI/olorenchemengine/master/install.sh > install.sh # hello
RUN bash -e install.sh
RUN python -c "import olorenchemengine as oce; oce.test_oce()"
PatWalters commented 2 years ago

I was able to make this work. I did the regular OCE install then conda install pytorch==1.11.0 torchvision==0.12.0 torchaudio==0.11.0 cudatoolkit=11.3 -c pytorch pip install torch-scatter torch-sparse torch-cluster torch-spline-conv torch-geometric -f https://data.pyg.org/whl/torch-1.11.0+cu113.html

davidzqhuang commented 2 years ago

That's amazing to hear, we will add these to our installation FAQ as well.

We will incorporate these commands into the install.sh file once we can reproducibly figure out when it's needed.