ContinualAI / avalanche

Avalanche: an End-to-End Library for Continual Learning based on PyTorch.
http://avalanche.continualai.org
MIT License
1.79k stars 292 forks source link

Support for Jetson Toolkit aarch64 platform #1610

Closed ZexinLi0w0 closed 3 months ago

ZexinLi0w0 commented 9 months ago

I am trying to build a stable version v0.5.0 on Jetson AGX Orin 32GB. But get stuck on the requirements of package qpsolvers[open_source_solvers].

πŸ› Describe the bug Some error logs:

WARNING: qpsolvers 4.3.1 does not provide the extra 'open-source-solvers'
...
Checking if build backend supports build_editable ... done
Building wheels for collected packages: avalanche-lib, cvxopt, ecos, quadprog, scs, qdldl
  Building editable for avalanche-lib (pyproject.toml) ... done
  Created wheel for avalanche-lib: filename=avalanche_lib-0.5.0-0.editable-py3-none-any.whl size=7998 sha256=e19d8b5397bd27679cde896969115fcca4df8a1dc860836ff20c453b44be202b
  Stored in directory: /tmp/pip-ephem-wheel-cache-_6px0ujn/wheels/f2/6a/f6/da4a5436b22b7edce2e18cb6b42b33a43f76e06f4d9f62d010
  Building wheel for cvxopt (pyproject.toml) ... error
  error: subprocess-exited-with-error

  Γ— Building wheel for cvxopt (pyproject.toml) did not run successfully.
  β”‚ exit code: 1
  ╰─> [38 lines of output]
      running bdist_wheel
      running build
      running build_py
      creating build
      creating build/lib.linux-aarch64-cpython-38
      creating build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/coneprog.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/printing.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/misc.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/_version.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/solvers.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/info.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/cvxprog.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/__init__.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/msk.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      copying src/python/modeling.py -> build/lib.linux-aarch64-cpython-38/cvxopt
      running build_ext
      building 'base' extension
      creating build/temp.linux-aarch64-cpython-38
      creating build/temp.linux-aarch64-cpython-38/src
      creating build/temp.linux-aarch64-cpython-38/src/C
      gcc -pthread -B /experiment/zexin/miniforge3/envs/test/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/experiment/zexin/miniforge3/envs/test/include/python3.8 -c src/C/base.c -o build/temp.linux-aarch64-cpython-38/src/C/base.o
      gcc -pthread -B /experiment/zexin/miniforge3/envs/test/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/experiment/zexin/miniforge3/envs/test/include/python3.8 -c src/C/dense.c -o build/temp.linux-aarch64-cpython-38/src/C/dense.o
      gcc -pthread -B /experiment/zexin/miniforge3/envs/test/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/experiment/zexin/miniforge3/envs/test/include/python3.8 -c src/C/sparse.c -o build/temp.linux-aarch64-cpython-38/src/C/sparse.o
      gcc -pthread -shared -B /experiment/zexin/miniforge3/envs/test/compiler_compat -L/experiment/zexin/miniforge3/envs/test/lib -Wl,-rpath=/experiment/zexin/miniforge3/envs/test/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-aarch64-cpython-38/src/C/base.o build/temp.linux-aarch64-cpython-38/src/C/dense.o build/temp.linux-aarch64-cpython-38/src/C/sparse.o -L/usr/lib -lm -llapack -lblas -o build/lib.linux-aarch64-cpython-38/cvxopt/base.cpython-38-aarch64-linux-gnu.so
      building 'blas' extension
      gcc -pthread -B /experiment/zexin/miniforge3/envs/test/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/experiment/zexin/miniforge3/envs/test/include/python3.8 -c src/C/blas.c -o build/temp.linux-aarch64-cpython-38/src/C/blas.o
      gcc -pthread -shared -B /experiment/zexin/miniforge3/envs/test/compiler_compat -L/experiment/zexin/miniforge3/envs/test/lib -Wl,-rpath=/experiment/zexin/miniforge3/envs/test/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-aarch64-cpython-38/src/C/blas.o -L/usr/lib -lblas -o build/lib.linux-aarch64-cpython-38/cvxopt/blas.cpython-38-aarch64-linux-gnu.so
      building 'lapack' extension
      gcc -pthread -B /experiment/zexin/miniforge3/envs/test/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/experiment/zexin/miniforge3/envs/test/include/python3.8 -c src/C/lapack.c -o build/temp.linux-aarch64-cpython-38/src/C/lapack.o
      gcc -pthread -shared -B /experiment/zexin/miniforge3/envs/test/compiler_compat -L/experiment/zexin/miniforge3/envs/test/lib -Wl,-rpath=/experiment/zexin/miniforge3/envs/test/lib -Wl,--no-as-needed -Wl,--sysroot=/ build/temp.linux-aarch64-cpython-38/src/C/lapack.o -L/usr/lib -llapack -lblas -o build/lib.linux-aarch64-cpython-38/cvxopt/lapack.cpython-38-aarch64-linux-gnu.so
      building 'umfpack' extension
      gcc -pthread -B /experiment/zexin/miniforge3/envs/test/compiler_compat -Wl,--sysroot=/ -Wsign-compare -DNDEBUG -g -fwrapv -O3 -Wall -Wstrict-prototypes -fPIC -I/usr/include -I/experiment/zexin/miniforge3/envs/test/include/python3.8 -c src/C/umfpack.c -o build/temp.linux-aarch64-cpython-38/src/C/umfpack.o
      src/C/umfpack.c:23:10: fatal error: umfpack.h: No such file or directory
         23 | #include "umfpack.h"
            |          ^~~~~~~~~~~
      compilation terminated.
      error: command '/usr/bin/gcc' failed with exit code 1
      [end of output]

  note: This error originates from a subprocess, and is likely not a problem with pip.
  ERROR: Failed building wheel for cvxopt

🐜 To Reproduce

  1. Download a v0.5.0 version of code.
  2. Create a conda environment with Python 3.8.10 (typical JetPack requirement).
  3. run the following code
    cd avalanche
    pip install -e ".[dev]"

🐝 Expected behavior I suggest removing this dependency qpsolvers[open_source_solvers] in requirements.txt and setup.py which may not be well supported on the arm64 platform. Instead use alternative packages like (this following code could pass smoke test in additional text):

sudo apt-get install libsuitesparse-dev
sudo apt-get install libblas-dev liblapack-dev gfortran
pip install osqp ecos scs qpsolvers
conda install cvxopt 

🐞 Screenshots

πŸ¦‹ Additional context System configuration:

Package: nvidia-jetpack
Version: 5.1-b147
Architecture: arm64
Maintainer: NVIDIA Corporation
Installed-Size: 194
Depends: nvidia-jetpack-runtime (= 5.1-b147), nvidia-jetpack-dev (= 5.1-b147)
Homepage: http://developer.nvidia.com/jetson
Priority: standard
Section: metapackages
Filename: pool/main/n/nvidia-jetpack/nvidia-jetpack_5.1-b147_arm64.deb
Size: 29306
SHA256: 750acd147aa354a2dff225245149c8ac6a3802234157f2185c5d1b6fa9b9d2d9
SHA1: 8363c940eadd7300de57a70e2cd99dd321781b1c
MD5sum: 3da9b145351144eb1588e07f04e1e3d3
Description: NVIDIA Jetpack Meta Package
Description-md5: ad1462289bdbc54909ae109d1d32c0a8

Smoke test code:

import torch
from torch.nn import CrossEntropyLoss
from torch.optim import SGD

from avalanche.benchmarks.classic import PermutedMNIST
from avalanche.models import SimpleMLP
from avalanche.training import Naive

# Config
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
# model
model = SimpleMLP(num_classes=10)

# CL Benchmark Creation
perm_mnist = PermutedMNIST(n_experiences=3)
train_stream = perm_mnist.train_stream
test_stream = perm_mnist.test_stream

# Prepare for training & testing
optimizer = SGD(model.parameters(), lr=0.001, momentum=0.9)
criterion = CrossEntropyLoss()

# Continual learning strategy
cl_strategy = Naive(
    model, optimizer, criterion, train_mb_size=32, train_epochs=2,
    eval_mb_size=32, device=device)

# train and test loop over the stream of experiences
results = []
for train_exp in train_stream:
    cl_strategy.train(train_exp)
    results.append(cl_strategy.eval(test_stream))
AntonioCarta commented 9 months ago

Does it work with version 0.4? If it does, we could fix it by moving qpsolvers to the optional dependencies.

ZexinLi0w0 commented 8 months ago

Does it work with version 0.4? If it does, we could fix it by moving qpsolvers to the optional dependencies.

Yes, it works with version 0.4.0 (commit ID is fe1c098).

AntonioCarta commented 8 months ago

Thanks, I think we can make qpsolvers an optional dependency. @AndreaCossu what do you think?

@ZexinLi0w0 right now you can fix the error by removing the qpsolvers dependency and removing its import (a single file with the GEM implementation, it will give you an error).

AndreaCossu commented 8 months ago

Making qpsolvers optional is definitely the easiest way. This of course assume you won't need GEM. Otherwise we should consider using the suggested packages.