BorgwardtLab / TOGL

Topological Graph Neural Networks (ICLR 2022)
https://openreview.net/pdf?id=oxxUMeFwEHd
BSD 3-Clause "New" or "Revised" License
105 stars 20 forks source link

mohit/TOGL/repos/torch_persistent_homology/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so: invalid ELF header #6

Closed mohit-kumar-27 closed 2 years ago

mohit-kumar-27 commented 2 years ago

(togl1) mohit@user-Default-string:~/mohit/TOGL$ poetry run topognn/train_model.py --model GNN --dataset MNIST Traceback (most recent call last): File "topognn/train_model.py", line 12, in import topognn.models as models File "/home/mohit/mohit/TOGL/topognn/models.py", line 15, in from topognn.layers import GCNLayer, GINLayer, GATLayer, SimpleSetTopoLayer, fake_persistence_computation#, EdgeDropout File "/home/mohit/mohit/TOGL/topognn/layers.py", line 8, in from torch_persistent_homology.persistent_homology_cpu import compute_persistence_homology_batched_mt ImportError: /home/mohit/mohit/TOGL/repos/torch_persistent_homology/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so: invalid ELF header

I am trying to run your code, I followed all your instructions in the README but I am getting this error. I am using pytorch 1.11.0 with cuda 11.3, I have installed the dependencies accordingly.

the specifications of the linux machine I am using is as follows:

x86_64 DISTRIB_ID=Ubuntu DISTRIB_RELEASE=18.04 DISTRIB_CODENAME=bionic DISTRIB_DESCRIPTION="Ubuntu 18.04.6 LTS" NAME="Ubuntu" VERSION="18.04.6 LTS (Bionic Beaver)" ID=ubuntu ID_LIKE=debian PRETTY_NAME="Ubuntu 18.04.6 LTS" VERSION_ID="18.04" HOME_URL="https://www.ubuntu.com/" SUPPORT_URL="https://help.ubuntu.com/" BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/" PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy" VERSION_CODENAME=bionic UBUNTU_CODENAME=bionic

Could you please suggest how I could solve this issue

Pseudomanifold commented 2 years ago

Can you try building the module manually for your environment? You can execute the build process via build.py in repos/torch_persistent_homology. Which version of Python and poetry are you using?

mohit-kumar-27 commented 2 years ago

I am using Python 3.8.13 Poetry version 1.1.13 I tried running build.py as mentioned by you. I am getting the same error as before

(togl1) mohit@user-Default-string:~/mohit/TOGL/repos/torch_persistent_homology$ python build.py (togl1) mohit@user-Default-string:~/mohit/TOGL/repos/torch_persistent_homology$ cd .. (togl1) mohit@user-Default-string:~/mohit/TOGL/repos$ cd .. (togl1) mohit@user-Default-string:~/mohit/TOGL$ poetry run topognn/train_model.py --model GNN --dataset MNIST Traceback (most recent call last): File "topognn/train_model.py", line 12, in import topognn.models as models File "/home/mohit/mohit/TOGL/topognn/models.py", line 15, in from topognn.layers import GCNLayer, GINLayer, GATLayer, SimpleSetTopoLayer, fake_persistence_computation#, EdgeDropout File "/home/mohit/mohit/TOGL/topognn/layers.py", line 8, in from torch_persistent_homology.persistent_homology_cpu import compute_persistence_homology_batched_mt ImportError: /home/mohit/mohit/TOGL/repos/torch_persistent_homology/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so: invalid ELF header

I found this statement on (https://tg4.solutions/how-to-resolve-invalid-elf-header-error/) in "Root cause for invalid elf header error section".If you are experiencing an invalid ELF header error, it is because the binaries included in your deployment package were built for a platform other than Linux.
Could you check

Pseudomanifold commented 2 years ago

We don't ship any binaries at all; the module is built on-demand. Can you please run poetry install in the torch_persistent_homology folder? Let's see whether the module can be built. Are you using conda in addition to poetry or somehing? I am seeing that there appears to be a specific virtual environment active.

mohit-kumar-27 commented 2 years ago

Earlier I created a virtual environment togl1 using conda and was running poetry install after activating conda environment.

Now i deactivated the conda environment and ran poetry install command it created a virtual environment (topognn-LDeohvGT-py3.8) and installed pytorch 1.8.1+cu102. I have cuda version 11.4 on my machine. Then I ran poetry run install_deps_cu102 and installed the remaining dependencies.

Then I tried running the code. Now I get the following error. Please help

(topognn-LDeohvGT-py3.8) (base) mohit@user-Default-string:~/mohit/TOGL$ poetry run topognn/train_model.py --model GNN --dataset MNIST Traceback (most recent call last): File "topognn/train_model.py", line 12, in import topognn.models as models File "/home/mohit/mohit/TOGL/topognn/models.py", line 10, in from torch_geometric.nn import GCNConv, GINConv, global_mean_pool, global_add_pool File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_geometric/init.py", line 2, in import torch_geometric.nn File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_geometric/nn/init.py", line 2, in from .data_parallel import DataParallel File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_geometric/nn/data_parallel.py", line 5, in from torch_geometric.data import Batch File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_geometric/data/init.py", line 1, in from .data import Data File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_geometric/data/data.py", line 8, in from torch_sparse import coalesce, SparseTensor File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_sparse/init.py", line 12, in torch.ops.load_library(importlib.machinery.PathFinder().find_spec( File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch/_ops.py", line 104, in load_library ctypes.CDLL(path) File "/usr/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: libcusparse.so.10: cannot open shared object file: No such file or directory

Pseudomanifold commented 2 years ago

This appears to be a mismatch between different CUDA versions. Can you try installing a different version of PyTorch?

mohit-kumar-27 commented 2 years ago

I changed the pytorch version to 1.11.0+cu113(latest version with cuda support) still getting same error. (topognn-LDeohvGT-py3.8) (base) mohit@user-Default-string:~/mohit/TOGL$ poetry run topognn/train_model.py --model GNN --dataset MNIST Traceback (most recent call last): File "topognn/train_model.py", line 12, in import topognn.models as models File "/home/mohit/mohit/TOGL/topognn/models.py", line 10, in from torch_geometric.nn import GCNConv, GINConv, global_mean_pool, global_add_pool File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_geometric/init.py", line 4, in import torch_geometric.data File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_geometric/data/init.py", line 1, in from .data import Data File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_geometric/data/data.py", line 9, in from torch_sparse import SparseTensor File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_sparse/init.py", line 19, in torch.ops.load_library(spec.origin) File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch/_ops.py", line 220, in load_library ctypes.CDLL(path) File "/usr/lib/python3.8/ctypes/init.py", line 373, in init self._handle = _dlopen(self._name, mode) OSError: libcusparse.so.11: cannot open shared object file: No such file or directory

Please help

Pseudomanifold commented 2 years ago

Does this library exist somewhere on the system? This seems to be an issue that is known to the pytorch-geometric developers. Please check out https://github.com/pyg-team/pytorch_geometric/issues/2040 for some suggestions (essentially, this would boil down to setting the LD_LIBRARY_PATH variable so that all libraries are found).

mohit-kumar-27 commented 2 years ago

I added the path to libcusparse.so.11 to LD_LIBRARY_PATH. Now the error of invalid ELF header is showing again (topognn-LDeohvGT-py3.8) (base) mohit@user-Default-string:~/mohit/TOGL$ poetry run topognn/train_model.py --model GNN --dataset MNIST Traceback (most recent call last): File "topognn/train_model.py", line 12, in import topognn.models as models File "/home/mohit/mohit/TOGL/topognn/models.py", line 15, in from topognn.layers import GCNLayer, GINLayer, GATLayer, SimpleSetTopoLayer, fake_persistence_computation#, EdgeDropout File "/home/mohit/mohit/TOGL/topognn/layers.py", line 8, in from torch_persistent_homology.persistent_homology_cpu import compute_persistence_homology_batched_mt ImportError: /home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so: invalid ELF header

Please help

Pseudomanifold commented 2 years ago

Can you run file on this library and ldd, i.e.

and paste the output here?

I don't understand at the moment how it is possible to for our pipeline to build a library that is not supposed to run on your computer.

mohit-kumar-27 commented 2 years ago

I ran the above commands, here is the output that I get

(topognn-LDeohvGT-py3.8) (base) mohit@user-Default-string:~/mohit/TOGL$ file /home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so
/home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so: data
(topognn-LDeohvGT-py3.8) (base) mohit@user-Default-string:~/mohit/TOGL$ ldd /home/mohit/.cache/pypoetry/virtualenvs/topognn-LDeohvGT-py3.8/lib/python3.8/site-packages/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so
    not a dynamic executable
Pseudomanifold commented 2 years ago

OK, this still looks like the extension is not built correctly. Please do the following:

  1. Clone or update the repository with git pull. I have updated the code substantially (merging a branch that contains our submitted code).
  2. Run poetry env use 3.8 (to make sure that the right Python version is being used)
  3. Run poetry install

This will not yet install CUDA dependencies, but it's important for me to understand why your build of the extension does not work.

After this, run

file repos/torch_persistent_homology/build/lib.linux-x86_64-cpython-38/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so

and paste the output here. This is mine:

$ file repos/torch_persistent_homology/build/lib.linux-x86_64-cpython-38/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so
repos/torch_persistent_homology/build/lib.linux-x86_64-cpython-38/torch_persistent_homology/persistent_homology_cpu.cpython-38-x86_64-linux-gnu.so: ELF 64-bit LSB shared object, x86-64, version 1 (GNU/Linux), dynamically linked, BuildID[sha1]=2e56b8702b5413bbacb9d069aa39446dc4b51414, with debug_info, not stripped

PS: Please take some time formatting the output in Markdown to make it easier to read for me.

mohit-kumar-27 commented 2 years ago

I cloned the repository once more, I had to clone torch persistent homology repository separately as the repos folder was empty after cloning just the TOGL repository.

then I ran the command poetry env use 3.8 followed by poetry install

(base) mohit@user-Default-string:~/TOGL$ poetry env use 3.8 Creating virtualenv topognn-IAlcjswr-py3.8 in /home/mohit/.cache/pypoetry/virtualenvs Using virtualenv: /home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8 (base) mohit@user-Default-string:~/TOGL$ poetry install Installing dependencies from lock file

Package operations: 148 installs, 0 updates, 0 removals

• Installing zipp (3.8.0) • Installing attrs (21.4.0) • Installing importlib-resources (5.7.1) • Installing pyrsistent (0.18.1) • Installing six (1.16.0) • Installing traitlets (5.2.2.post1) • Installing entrypoints (0.4) • Installing fastjsonschema (2.15.3) • Installing jsonschema (4.6.0) • Installing jupyter-core (4.10.0) • Installing nest-asyncio (1.5.5) • Installing parso (0.8.3) • Installing ptyprocess (0.7.0) • Installing pycparser (2.21) • Installing python-dateutil (2.8.2) • Installing pyzmq (23.1.0) • Installing tornado (6.1) • Installing wcwidth (0.2.5) • Installing backcall (0.2.0) • Installing cffi (1.15.0) • Installing decorator (5.1.1) • Installing jedi (0.18.1) • Installing jupyter-client (7.3.4) • Installing markupsafe (2.1.1) • Installing matplotlib-inline (0.1.3) • Installing nbformat (5.4.0) • Installing pexpect (4.8.0) • Installing pickleshare (0.7.5) • Installing prompt-toolkit (3.0.29) • Installing pygments (2.12.0) • Installing pyparsing (3.0.9) • Installing soupsieve (2.3.2.post1) • Installing webencodings (0.5.1) • Installing argon2-cffi-bindings (21.2.0) • Installing beautifulsoup4 (4.11.1) • Installing bleach (5.0.0) • Installing certifi (2022.5.18.1) • Installing charset-normalizer (2.0.12) • Installing debugpy (1.6.0) • Installing defusedxml (0.7.1) • Installing idna (3.3) • Installing ipython (7.34.0) • Installing jinja2 (3.1.2) • Installing jupyterlab-pygments (0.2.2) • Installing mistune (0.8.4) • Installing nbclient (0.6.4) • Installing packaging (21.3) • Installing pandocfilters (1.5.0) • Installing psutil (5.9.1) • Installing pyasn1 (0.4.8) • Installing tinycss2 (1.1.1) • Installing urllib3 (1.25.11) • Installing argon2-cffi (21.3.0) • Installing cachetools (5.2.0) • Installing frozenlist (1.3.0) • Installing ipykernel (6.13.1) • Installing ipython-genutils (0.2.0) • Installing multidict (6.0.2) • Installing nbconvert (6.5.0) • Installing oauthlib (3.2.0) • Installing prometheus-client (0.14.1) • Installing pyasn1-modules (0.2.8) • Installing requests (2.28.0) • Installing rsa (4.8) • Installing send2trash (1.8.0) • Installing terminado (0.15.0) • Installing tomli (2.0.1) • Installing aiosignal (1.2.0) • Installing async-timeout (4.0.2) • Installing cycler (0.11.0) • Installing fonttools (4.33.3) • Installing google-auth (2.7.0) • Installing importlib-metadata (4.11.4) • Installing kiwisolver (1.4.2) • Installing notebook (6.4.12) • Installing numpy (1.21.6) • Installing pillow (9.1.1) • Installing requests-oauthlib (1.3.1) • Installing setuptools-scm (6.4.2) • Installing smmap (5.0.0) • Installing typing-extensions (4.2.0) • Installing yarl (1.7.2) • Installing absl-py (1.1.0) • Installing aiohttp (3.8.1) • Installing gitdb (4.0.9) • Installing google-auth-oauthlib (0.4.6) • Installing grpcio (1.46.3) • Installing isodate (0.6.1) • Installing joblib (1.1.0) • Installing jupyterlab-widgets (1.1.0) • Installing littleutils (0.2.2) • Installing llvmlite (0.38.1) • Installing markdown (3.3.7) • Installing matplotlib (3.5.2) • Installing networkx (2.6.3) • Installing protobuf (4.21.1) • Installing pytz (2022.1) • Installing scipy (1.7.3) • Installing tenacity (8.0.1) • Installing tensorboard-data-server (0.6.1) • Installing tensorboard-plugin-wit (1.8.1) • Installing texttable (1.6.4) • Installing threadpoolctl (3.1.0) • Installing torch (1.8.1) • Installing werkzeug (2.1.2) • Installing widgetsnbextension (3.6.0) • Installing ase (3.22.1) • Installing click (8.1.3) • Installing configparser (5.2.0) • Installing docker-pycreds (0.4.0) • Installing fsspec (2022.5.0) • Installing future (0.18.2) • Installing gitpython (3.1.27) • Installing googledrivedownloader (0.4) • Installing h5py (3.7.0) • Installing iniconfig (1.1.1) • Installing ipywidgets (7.7.0) • Installing numba (0.55.2) • Installing outdated (0.2.1) • Installing pandas (1.3.5) • Installing pathtools (0.1.2) • Installing plotly (5.8.1) • Installing pluggy (1.0.0) • Installing promise (2.3) • Installing py (1.11.0) • Installing pyflagser (0.4.4) • Installing python-igraph (0.8.3) • Installing python-louvain (0.16) • Installing pyyaml (6.0) • Installing rdflib (6.1.1) • Installing scikit-learn (0.24.2) • Installing sentry-sdk (1.5.12) • Installing shortuuid (1.0.9) • Installing subprocess32 (3.5.4) • Installing tensorboard (2.9.0) • Installing toml (0.10.2) • Installing torchmetrics (0.2.0) • Installing tqdm (4.64.0) • Installing dgl (0.6.1) • Installing giotto-tda (0.4.0) • Installing ipdb (0.13.9) • Installing ogb (1.3.3) • Installing pytest (6.2.5) • Installing pytorch-lightning (1.2.10) • Installing tadasets (0.0.4) • Installing torch-geometric (1.6.3) • Installing wandb (0.10.33) • Installing torch-persistent-homology (0.1.0 /home/mohit/TOGL/repos/torch_persistent_homology): Failed

EnvCommandError

Command ['/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/bin/pip', 'install', '--no-deps', '-U', '-e', '/home/mohit/TOGL/repos/torch_persistent_homology'] errored with the following return code 1, and output: Obtaining file:///home/mohit/TOGL/repos/torch_persistent_homology Installing build dependencies: started Installing build dependencies: finished with status 'done' Checking if build backend supports build_editable: started Checking if build backend supports build_editable: finished with status 'done' Getting requirements to build editable: started Getting requirements to build editable: finished with status 'done' Preparing editable metadata (pyproject.toml): started Preparing editable metadata (pyproject.toml): finished with status 'done' Building wheels for collected packages: torch-persistent-homology Building editable for torch-persistent-homology (pyproject.toml): started Building editable for torch-persistent-homology (pyproject.toml): finished with status 'error' error: subprocess-exited-with-error

× Building editable for torch-persistent-homology (pyproject.toml) did not run successfully.
│ exit code: 1
╰─> [39 lines of output]
    A setup.py file already exists. Using it.
    running build
    running build_py
    running build_ext
    building 'torch_persistent_homology.persistent_homology_cpu' extension
    In file included from /tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/Device.h:3:0,
                     from /tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/api/include/torch/python.h:8,
                     from /tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/torch/include/torch/extension.h:6,
                     from torch_persistent_homology/unionfind.hh:2,
                     from torch_persistent_homology/perisistent_homology_cpu.cpp:2:
    /tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/torch/include/torch/csrc/python_headers.h:10:10: fatal error: Python.h: No such file or directory
     #include <Python.h>
              ^~~~~~~~~~
    compilation terminated.
    /tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/torch/_masked/__init__.py:223: UserWarning: Failed to initialize NumPy: numpy.core.multiarray failed to import (Triggered internally at  ../torch/csrc/utils/tensor_numpy.cpp:68.)
      example_input = torch.tensor([[-3, -2, -1], [0, 1, 2]])
    /tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/torch/utils/cpp_extension.py:387: UserWarning: Attempted to use ninja as the BuildExtension backend but we could not find ninja.. Falling back to using the slow distutils backend.
      warnings.warn(msg.format('we could not find ninja.'))
    error: command '/usr/bin/x86_64-linux-gnu-gcc' failed with exit code 1
    Traceback (most recent call last):
      File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 363, in <module>
        main()
      File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 345, in main
        json_out['return_val'] = hook(**hook_input['kwargs'])
      File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/pip/_vendor/pep517/in_process/_in_process.py", line 283, in build_editable
        return hook(wheel_directory, config_settings, metadata_directory)
      File "/tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/poetry/core/masonry/api.py", line 87, in build_editable
        return unicode(WheelBuilder.make_in(poetry, Path(wheel_directory), editable=True))
      File "/tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/poetry/core/masonry/builders/wheel.py", line 78, in make_in
        wb.build()
      File "/tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/poetry/core/masonry/builders/wheel.py", line 112, in build
        self._build(zip_file)
      File "/tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/poetry/core/masonry/builders/wheel.py", line 162, in _build
        self._run_build_command(setup)
      File "/tmp/pip-build-env-5ksvq590/overlay/lib/python3.8/site-packages/poetry/core/masonry/builders/wheel.py", line 190, in _run_build_command
        subprocess.check_call(
      File "/usr/lib/python3.8/subprocess.py", line 364, in check_call
        raise CalledProcessError(retcode, cmd)
    subprocess.CalledProcessError: Command '['/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/bin/python', '/home/mohit/TOGL/repos/torch_persistent_homology/setup.py', 'build', '-b', '/home/mohit/TOGL/repos/torch_persistent_homology/build']' returned non-zero exit status 1.
    [end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building editable for torch-persistent-homology

Failed to build torch-persistent-homology ERROR: Could not build wheels for torch-persistent-homology, which is required to install pyproject.toml-based projects WARNING: You are using pip version 22.0.3; however, version 22.1.2 is available. You should consider upgrading via the '/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/bin/python -m pip install --upgrade pip' command.

at ~/.poetry/lib/poetry/utils/env.py:1195 in _run 1191│ output = subprocess.checkoutput( 1192│ cmd, stderr=subprocess.STDOUT, **kwargs 1193│ ) 1194│ except CalledProcessError as e: → 1195│ raise EnvCommandError(e, input=input) 1196│ 1197│ return decode(output) 1198│ 1199│ def execute(self, bin, *args, **kwargs):

after installing the above packages it threw an Env Command error while installing torch-persistent-homology

Pseudomanifold commented 2 years ago

This post is still not being formatted correctly; I would appreciate proper formatting because it makes everything easier to read.

I see that the module cannot be built. One question, one suggestion:

  1. Is this the first time this error occurred?
  2. Can you install python3-dev (or an equivalent package for Ubuntu)? It seems that you are missing the development files for building the C++ extensions. After this, try poetry install again.

Thanks!

mohit-kumar-27 commented 2 years ago

The Env Command error did occur for the first time. I ran sudo apt update followed by sudo apt install python3-dev to install python3-dev. I ran poetry env use 3.8 followed by poetry install but still getting the same error

Pseudomanifold commented 2 years ago

OK, what's your native Python version? I am guessing that maybe there is a python3.8-dev package that you could install? Try that, followed by poetry install to see where that gets you.

mohit-kumar-27 commented 2 years ago

I installed python3.8-dev and I was able to run poetry install command successfully. then i ran poetry run install_deps_cpu to install other dependencies. Then when I ran poetry run python topognn/train_model.py --model TopoGNN --dataset DD --max_epochs 10, I am getting OSError: libcusparse.so.10: cannot open shared object file: No such file or directory

Could you please tell where to find libcusparse.so.10 file when installing pytorch via poetry, so that I can add it to LD_LIBRARY_PATH to solve the issue?

Pseudomanifold commented 2 years ago

OK, we are progressing! Please check this issue in pytorch-geometric for help with your specific problem. One user describes a similar problem to yours and how to solve it here.

It would be helpful if you could run a search for libcusparse.so.10 in your system—this is a library installed from CUDA, not from our package.

mohit-kumar-27 commented 2 years ago

https://github.com/pyg-team/pytorch_geometric/issues/4793 Could you see this link once, I describe there how I tried solving the above issue and a new error is showing now

Could you suggest a way of installing your project and its dependencies without using poetry, maybe then this issue will not come.

I tried first activating the poetry environment using poetry shell and then uninstalling and installing pytorch and pytorch geometric but it was not getting installed in the virtual environment created by poetry

Pseudomanifold commented 2 years ago

You can of course install all dependencies manually using pip. The error that you encounter looks like it could be solved by installing torch_scatter from source. See the installation instructions for more details.

Potentially you could also install another type of CUDA dependencies that are more adequate for your system:

poetry run install_deps_{cpu, cu101, cu102, cu110}

Please clarify the following for me:

  1. Can you install the project via poetry install or does this already throw some errors?
  2. Is the problem with "invalid ELF header" now fixed?
mohit-kumar-27 commented 2 years ago

poetry install does not gave any errors. Invalid ELF header error is also fixed

I created a new virtual environment and installed all the packages using pip I ran the command (mohit_f) mohit@user-Default-string:~/TOGL$ python topognn/train_model.py --model TopoGNN --dataset DD --max_epochs 10
Traceback (most recent call last): File "topognn/train_model.py", line 13, in import topognn.data_utils as topo_data ModuleNotFoundError: No module named 'topognn'

Can you please tell how to resolve this

Pseudomanifold commented 2 years ago

Can you import topognn from a Python shell within the respective virtual environment?

$ python
import topognn
mohit-kumar-27 commented 2 years ago

No I cannot import Can you share how to install topognn in my virtual environment?

Pseudomanifold commented 2 years ago

What's the error? You said that you ran poetry install, so the package should have been installed.

mohit-kumar-27 commented 2 years ago

I said that running poetry install does not give any errors now. But I did not run poetry install for installing packages in the new virtual environment that I created. I installed all the required packages manually using pip. As I did not run poetry install for my new virtual environment, the topognn package was not installed. What I am asking is , Is there a way to install the topognn package without using poetry?

Pseudomanifold commented 2 years ago

Yes, you can try pip install in the root directory of the package. But it's not the preferred way. Also: adding these details really helps in diagnosing the problem. If poetry install works now, why do you not use it to at least install the main dependencies of the package and install the CUDA libraries manually?

mohit-kumar-27 commented 2 years ago

If I run poetry install to install the main dependencies of the package then poetry creates a new virtual environment and installs the packages in the virtual environment created by poetry. I have to install the dependencies in the virtual environment created by me and not by poetry

Pseudomanifold commented 2 years ago

I would suggest to use poetry install to create the virtual environment for you, and then install the other packages in there manually. You can activate the environment using poetry shell and then use regular pip install commands to make this work. Does this work for you?

I want to understand your initial issue better, though: which packages are not installed correctly? Did you solve your problem with PyG by installing them manually now? If so, you can do the same installation steps in the aforementioned active poetry environment, but it would be very helpful for us to know what exactly does not work in your case.

mohit-kumar-27 commented 2 years ago

If I install the pytorch geometric and its dependencies(cluster,sparse,spline-conv etc) manually in the virtual environment created by poetry I get OSError: libcusparse.so.10: cannot open shared object file: No such file or directory error and I cannot find the path to libcusparse.so.10 for the virtual environment created by poetry

Pseudomanifold commented 2 years ago

Which commands do you use to install these dependencies? Do you use the same commands for your other virtual environment? Also, what are the exact Python versions of the respective environments? Are both created using Python 3.8?

mohit-kumar-27 commented 2 years ago

Both virtual environments use python 3.8 the dependencies for poetry environment were installed using poetry run install_deps_cu102

In the virtual environment created by me manually,

  1. I installed pytorch using conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch
  2. then i installed pytorch geometric and its dependencies using pip install torch-scatter -f https://data.pyg.org/whl/torch-1.11.0+cu113.html pip install torch-sparse -f https://data.pyg.org/whl/torch-1.11.0+cu113.html pip install torch-geometric pip install torch-cluster -f https://data.pyg.org/whl/torch-1.11.0+cu113.html pip install torch-spline-conv -f https://data.pyg.org/whl/torch-1.11.0+cu113.html
Pseudomanifold commented 2 years ago

OK, so these are obviously different things that are installed. Can you run the same pip installation commands in environment created by poetry instead of poetry run install_deps_cu102?

The main issue seems to be that you have a CUDA toolkit that our initial dependencies do not support.

mohit-kumar-27 commented 2 years ago

If I run conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch in the environment created by poetry, pytorch does not get installed in the environment created by poetry, see the commands here:

(topognn-IAlcjswr-py3.8) mohit@user-Default-string:~/TOGL$ conda install pytorch torchvision torchaudio cudatoolkit=11.3 -c pytorch Collecting package metadata (current_repodata.json): done Solving environment: done

All requested packages already installed.

(topognn-IAlcjswr-py3.8) mohit@user-Default-string:~/TOGL$ python Python 3.8.13 (default, Apr 19 2022, 00:53:22) [GCC 7.5.0] on linux Type "help", "copyright", "credits" or "license" for more information.

import torch Traceback (most recent call last): File "", line 1, in ModuleNotFoundError: No module named 'torch'

Pseudomanifold commented 2 years ago

That's not what I suggested. Try setting up the poetry environment using poetry install, then install torch-scatter etc. using the pip commands you showed above.

mohit-kumar-27 commented 2 years ago

I did as you said and now I am getting this error Since running poetry install, installed pytorch 1.8.1 with cuda 10.2 I ran the following commands in the poetry environment.

pip install torch-scatter -f https://data.pyg.org/whl/torch-1.8.1+cu102.html pip install torch-sparse -f https://data.pyg.org/whl/torch-1.8.1+cu102.html pip install torch-cluster -f https://data.pyg.org/whl/torch-1.8.1+cu102.html pip install torch-spline-conv -f https://data.pyg.org/whl/torch-1.8.1+cu102.html

(topognn-IAlcjswr-py3.8) mohit@user-Default-string:~/TOGL$ poetry run topognn/train_model.py --model GNN --dataset MNISTTraceback (most recent call last): File "topognn/train_model.py", line 13, in import topognn.data_utils as topo_data File "/home/mohit/TOGL/topognn/data_utils.py", line 17, in from torch_geometric.data import Data File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch_geometric/init.py", line 2, in import torch_geometric.nn File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch_geometric/nn/init.py", line 2, in from .data_parallel import DataParallel File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch_geometric/nn/data_parallel.py", line 5, in from torch_geometric.data import Batch File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch_geometric/data/init.py", line 1, in from .data import Data File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch_geometric/data/data.py", line 8, in from torch_sparse import coalesce, SparseTensor File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch_sparse/init.py", line 41, in from .tensor import SparseTensor # noqa File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch_sparse/tensor.py", line 13, in class SparseTensor(object): File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch/jit/_script.py", line 974, in script _compile_and_register_class(obj, _rcb, qualified_name) File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch/jit/_script.py", line 67, in _compile_and_register_class torch._C._jit_script_class_compile(qualified_name, ast, defaults, rcb) RuntimeError: Tried to access nonexistent attribute or method 'crow_indices' of type 'Tensor'.: File "/home/mohit/.cache/pypoetry/virtualenvs/topognn-IAlcjswr-py3.8/lib/python3.8/site-packages/torch_sparse/tensor.py", line 109 def from_torch_sparse_csr_tensor(self, mat: torch.Tensor, has_value: bool = True): rowptr = mat.crow_indices()


        col = mat.col_indices()

Do you know how to solve this?
Pseudomanifold commented 2 years ago

Can you try installing torch-sparse==0.6.1?

mohit-kumar-27 commented 2 years ago

I installed torch-sparse 0.6.1 using mohit@user-Default-string:~/TOGL$ pip install torch-sparse==0.6.1 -f https://data.pyg.org/whl/torch-1.8.1+cu102.html and torch sparse 0.6.1 was succesfully installed But when I give poetry run topognn/train_model.py --model GNN --dataset MNIST I get RuntimeError: Detected that PyTorch and torch_sparse were compiled with different CUDA versions. PyTorch has CUDA version 10.2 and torch_sparse has CUDA version 0.0. Please reinstall the torch_sparse that matches your PyTorch install.

I tried uninstalling and installing torch-sparse using the above command multiple times still same issue

Pseudomanifold commented 2 years ago

This looks like a bug in torch-sparse; I would expect that installing it like this (with the right CUDA version) should result in a proper binary.

As an alternative to this, you can also install everything without poetry; maybe this is easier. You just need to comment or remove the line

torch_persistent_homology = { path = "repos/torch_persistent_homology", develop = true }

from pyproject.toml. Afterwards, you should be able to do pip install in the project root and pip install in the repos/torch_persistent_homology folder. I tested this procedure in a new virtual environment, so if you have an environment with CUDA/Torch/etc. set up, this should work.

mohit-kumar-27 commented 2 years ago

Hi Bastain! I tried installing without poetry and running your code. Everything worked... I am not able to figure out how to set the DATA_DIR , as the code is looking for the data in the wrong directory. Here is the output that I get (togl) mohit@user-Default-string:~/TOGL$ python topognn/train_model.py --model TopoGNN --dataset DD --batch_size 20 --lr 0.0007 Using backend: pytorch /scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py:68: UserWarning: No correct seed found, seed set to 3526443079 warnings.warn(*args, kwargs) Global seed set to 3526443079 Traceback (most recent call last): File "topognn/train_model.py", line 150, in main(model_cls, dataset_cls, args) File "topognn/train_model.py", line 59, in main dataset.prepare_data() File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/core/datamodule.py", line 92, in wrapped_fn return fn(*args, *kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/pytorch_lightning/utilities/distributed.py", line 48, in wrapped_fn return fn(args, kwargs) File "/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/data_utils.py", line 549, in prepare_data with open(os.path.join(DATA_DIR, 'Benchmarkidx', self.name+""+section+'.index'), 'r') as f: FileNotFoundError: [Errno 2] No such file or directory: '/scott/mohit/anaconda3/envs/togl/lib/python3.8/site-packages/topognn/../data/Benchmark_idx/DD_train.index'

Pseudomanifold commented 2 years ago

I opened a new issue for this. Will close this one since the original problem has long been resolved.