Lightning-AI / pytorch-lightning

Pretrain, finetune and deploy AI models on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
27.93k stars 3.34k forks source link

Set up on Jetson Nano/Xavier NX #7408

Closed ApluUalberta closed 3 years ago

ApluUalberta commented 3 years ago

Set up issue

Hi there,

I tried to create appropriate labels for the specific issue that I'm experiencing, but I couldn't find a way to put the proper tags, so i apologize in advance.

I’m currently trying to set up Pytorch Lightning on Jetson Nano/Jetson Xavier NX by building from source. So far, I have tried following this thread here: #695

The requirements.txt has been changed and no longer has torchvision and scikit-learn as one of the requirements. However, it seems to seek a torch version>=1.4 as a result of torchmetrics>=0.2.0 (within requirements.txt). My Jetson even has torch 1.8.0 and torchvision in its pip3 package manager. I was wondering if anyone has successfully set Pytorch Lightning up on ARM64's new requirement layout. Thanks!

Is there something I’m missing? I have also tried running setup.py to no avail. Thanks!

Expected behavior

My pip install attempt: Selection_001

my pip install requirements attempt: Selection_004

my pip list: Selection_005

Environment

This is not a script bug as I'm only having trouble with setup using the pip package manager on Jetson/ARM64. Once again, I'm sorry for the tags!

Additional context

loic-beheshti commented 3 years ago

Hi,

I actually never tried to set up Pytorch Lightning directly on a Jetson but people are usually more interested in the deployment side with such devices. If this is your end goal, I suggest you look into this project https://github.com/neo-ai/neo-ai-dlr.

I had a few issues compiling on sagemaker neo as a pytorch model so I suggest you convert your model to .onnx first. Depending on the Pytorch Lightning version you're using, you may also face issues with DLR, in the worst case scenario, I suggest you convert your Pytorch Lightning model into to a classic pytorch one once trained (you mainly need to slightly modify the state dictionary).

This is how I personally deploy on Jetson devices from Pytorch Lightning modules. If this is what you're looking for, I would be happy to share more details.

ApluUalberta commented 3 years ago

orch Lightning directly on a Jetson but people are usually more interested in the deployment side with such devices. If

Thank you for the response! But I have already successfully deployed with ONNX and Libtorch C++ on the Jetson Xavier. Now, I specifically need to have the option of training on it. Thank you for the offer and project reference though, I'll definitely take a look!

ApluUalberta commented 3 years ago

I have fixed the issue but there are still some potential questions for future ARM64 developers that may be important. I may explore this question further.

The problem seemed to be that either the torch version that I possessed was not 1.4.0, or that I needed to use pip instead of pip3. For some reason, pip installed for both pip python 2 and pip3. I would invite other people to evaluate this further. This is a rundown of my documentation on the process, and will be subject to change should future updates to this library, and Jetson Hardware be put into place:

Pytorch Lightning Setup on Jetson Xavier NX

The general theme seems that we need to install pytorch lightning with the Pip package manager instead of pip3

and/or that we need to have torch 1.4.0 with its corresponding torchvision in order to pass the torch>=1.4 requirement

Not sure which, but it seemsl ike having both also works

1. We need to retrieve the Pytorch Variant 1.4.0 and torchvision 0.5.0 (Torch needs to bne >=1.4 but it seems 1.4.0 may be necessary). Prior to this, we already possessed pip3 and a torch and torchvision installation corresponding to that.

$ wget https://nvidia.box.com/shared/static/1v2cc4ro6zvsbu0p8h6qcuaqco1qcsif.whl -O torch-1.4.0-cp27-cp27mu-linux_aarch64.whl

$ git clone --branch v0.5.0 https://github.com/pytorch/vision torchvision   # see below for version of torchvision to download

2. We will need both pip and pip3 to install pytorch lightning

$ sudo apt-get update
$ sudo apt-get install libhdf5-serial-dev hdf5-tools libhdf5-dev zlib1g-dev zip libjpeg8-dev liblapack-dev libblas-dev gfortran
$ sudo apt-get install python-pip
$ sudo pip3 install -U pip testresources setuptools==49.6.0 # This step wasn't necessary but was done during the installation process for python 2 pip

3. Once we have the pip package manager, we need to install torch 1.4.0 on out python 2 environment. We also need to edit the Requirements.txt within pytorch-lightning environment

$ cd torchvision
$ export BUILD_VERSION=0.5.0 
$ python3 setup.py install --user
$ cd path/to/pytorch-lightning

open requirements.txt..
Comment out the torch>=1.4 constraint as follows:
numpy>=1.17.2
#torch>=1.4 <---------------------------------------------------------------
future>=0.17.1  # required for builtins in setup.py
tqdm>=4.41.0
PyYAML>=5.1,<=5.4.1
fsspec[http]>=2021.4.0
tensorboard>=2.2.0, !=2.5.0  # 2.5.0 GPU CI error: 'Couldn't build proto file into descriptor pool!'
torchmetrics>=0.2.0
pyDeprecate==0.3.0
packaging

4. Install (The pip install seems to also install for pip3 manager)

$ pip install pytorch-lightning # Valid for both pip and pip3

5. We can continue to verify with pip list

$ pip3 list
$ pip list

It's difficult to screencap these results using shutter with highlighted similarities, but your pip and pip3 list should look the same with the following and the same versions: Selection_013 Selection_014

Sources

Pytorch and Torchvision installation for Python 2 and 3: https://forums.developer.nvidia.com/t/pytorch-for-jetson-version-1-8-0-now-available/72048

Python 2 pip installation: https://docs.nvidia.com/deeplearning/frameworks/install-tf-jetson-platform/index.html Pytorch Lightning forum post (seems correct to a degree but non-working for us): https://forums.developer.nvidia.com/t/pytorch-lightning-set-up-on-jetson-nano-xavier-nx/177329

Pytorch Lightning installation on Jetson Xavier NX - USAEng, Note.com: fullsize-en

stale[bot] commented 3 years ago

This issue has been automatically marked as stale because it hasn't had any recent activity. This issue will be closed in 7 days if no further activity occurs. Thank you for your contributions, Pytorch Lightning Team!