Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.31k stars 3.38k forks source link

ContextualVersionConflict, when trying to use Lightning with TPU in Google Colab #16464

Closed mario-dg closed 1 year ago

mario-dg commented 1 year ago

Bug description

Following the official colab notebook for working with Lightning and TPU in Google colab, I'm not able to get the training running.

How to reproduce the bug

!pip install cloud-tpu-client torch==1.13.0 torchvision torchtext torchaudio https://storage.googleapis.com/tpu-pytorch/wheels/colab/torch_xla-1.13-cp38-cp38-linux_x86_64.whl
!pip install pytorch-lightning

Install all dependencies with the commands above.

import torch
import torch_xla
import torch_xla.core.xla_model as xm
xm.xla_device()

output:

WARNING:root:Waiting for TPU to be start up with version pytorch-1.13...
WARNING:root:Waiting for TPU to be start up with version pytorch-1.13...
WARNING:root:Waiting for TPU to be start up with version pytorch-1.13...
WARNING:root:TPU has started up successfully with version pytorch-1.13
device(type='xla', index=1)

So to my understanding the device is working.

But

trainer = Trainer(
    max_epochs=config['num_epochs'],
    accelerator='tpu',
    devices=8,
    deterministic=True,
    logger=wandb_logger
    )

results in the following error message:

Error messages and logs

[/usr/local/lib/python3.8/dist-packages/pytorch_lightning/accelerators/tpu.py](https://localhost:8080/#) in __init__(self, *args, **kwargs)
     27     def __init__(self, *args: Any, **kwargs: Any) -> None:
     28         if not _XLA_AVAILABLE:
---> 29             raise ModuleNotFoundError(str(_XLA_AVAILABLE))
     30         super().__init__(*args, **kwargs)
     31 

ModuleNotFoundError: ContextualVersionConflict: (google-api-python-client 2.70.0 (/usr/local/lib/python3.8/dist-packages), Requirement.parse('google-api-python-client==1.8.0'), {'cloud-tpu-client'}). HINT: Try running `pip install -U 'torch_xla'`

Environment

Current environment ``` #- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow): Trainer #- PyTorch Lightning Version (e.g., 1.5.0): 1.9.0 #- Lightning App Version (e.g., 0.5.2): #- PyTorch Version (e.g., 1.10): 1.13 #- Python version (e.g., 3.9): 3.8.10 #- OS (e.g., Linux): #- CUDA/cuDNN version: #- GPU models and configuration: #- How you installed Lightning(`conda`, `pip`, source): pip #- Running environment of LightningApp (e.g. local, cloud): Google Colab, with TPU selected ```
pip list Pip package versions should also match: ``` Package Version ----------------------------- ---------------------- absl-py 1.3.0 aeppl 0.0.33 aesara 2.7.9 aiohttp 3.8.3 aiosignal 1.3.1 alabaster 0.7.12 albumentations 1.2.1 altair 4.2.0 anyio 3.6.2 appdirs 1.4.4 arrow 1.2.3 arviz 0.12.1 astor 0.8.1 astropy 4.3.1 astunparse 1.6.3 async-timeout 4.0.2 atari-py 0.2.9 atomicwrites 1.4.1 attrs 22.2.0 audioread 3.0.0 autograd 1.5 Babel 2.11.0 backcall 0.2.0 beautifulsoup4 4.11.1 bleach 5.0.1 blessed 1.19.1 blis 0.7.9 bokeh 2.3.3 branca 0.6.0 bs4 0.0.1 CacheControl 0.12.11 cachetools 5.2.1 catalogue 2.0.8 certifi 2022.12.7 cffi 1.15.1 cftime 1.6.2 chardet 4.0.0 charset-normalizer 2.1.1 click 7.1.2 clikit 0.6.2 cloud-tpu-client 0.10 cloudpickle 2.2.0 cmake 3.22.6 cmdstanpy 1.0.8 colorcet 3.0.1 colorlover 0.3.0 community 1.0.0b1 confection 0.0.3 cons 0.4.5 contextlib2 0.5.5 convertdate 2.4.0 crashtest 0.3.1 crcmod 1.7 croniter 1.3.8 cufflinks 0.17.3 cvxopt 1.3.0 cvxpy 1.2.3 cycler 0.11.0 cymem 2.0.7 Cython 0.29.33 daft 0.0.4 dask 2022.2.1 datascience 0.17.5 dateutils 0.6.12 db-dtypes 1.0.5 dbus-python 1.2.16 debugpy 1.0.0 decorator 4.4.2 deepdiff 6.2.3 defusedxml 0.7.1 descartes 1.1.0 dill 0.3.6 distributed 2022.2.1 dlib 19.24.0 dm-tree 0.1.8 dnspython 2.2.1 docker-pycreds 0.4.0 docutils 0.16 dopamine-rl 1.0.5 earthengine-api 0.1.335 easydict 1.10 ecos 2.0.12 editdistance 0.5.3 email-validator 1.3.1 en-core-web-sm 3.4.1 entrypoints 0.4 ephem 4.1.4 et-xmlfile 1.1.0 etils 1.0.0 etuples 0.3.8 fa2 0.3.5 fastai 2.7.10 fastapi 0.88.0 fastcore 1.5.27 fastdownload 0.0.7 fastdtw 0.3.4 fastjsonschema 2.16.2 fastprogress 1.0.3 fastrlock 0.8.1 feather-format 0.4.1 filelock 3.9.0 firebase-admin 5.3.0 fix-yahoo-finance 0.0.22 Flask 1.1.4 flatbuffers 1.12 folium 0.12.1.post1 frozenlist 1.3.3 fsspec 2022.11.0 future 0.16.0 gast 0.4.0 GDAL 3.0.4 gdown 4.4.0 gensim 3.6.0 geographiclib 1.52 geopy 1.17.0 gin-config 0.5.0 gitdb 4.0.10 GitPython 3.1.30 glob2 0.7 google 2.0.3 google-api-core 1.34.0 google-api-python-client 1.8.0 google-auth 2.16.0 google-auth-httplib2 0.1.0 google-auth-oauthlib 0.4.6 google-cloud-bigquery 3.4.1 google-cloud-bigquery-storage 2.17.0 google-cloud-core 2.3.2 google-cloud-datastore 2.11.1 google-cloud-firestore 2.7.3 google-cloud-language 2.6.1 google-cloud-storage 2.7.0 google-cloud-translate 3.8.4 google-colab 1.0.0 google-crc32c 1.5.0 google-pasta 0.2.0 google-resumable-media 2.4.0 googleapis-common-protos 1.58.0 googledrivedownloader 0.4 graphviz 0.10.1 greenlet 2.0.1 grpcio 1.51.1 grpcio-status 1.48.2 gspread 3.4.2 gspread-dataframe 3.0.8 gym 0.25.2 gym-notices 0.0.8 h11 0.14.0 h5py 3.1.0 HeapDict 1.0.1 hijri-converter 2.2.4 holidays 0.18 holoviews 1.14.9 html5lib 1.0.1 httpcore 0.16.3 httpimport 0.5.18 httplib2 0.17.4 httpstan 4.6.1 httptools 0.5.0 httpx 0.23.3 huggingface-hub 0.11.1 humanize 0.5.1 hyperopt 0.1.2 idna 2.10 imageio 2.9.0 imagesize 1.4.1 imbalanced-learn 0.8.1 imblearn 0.0 imgaug 0.4.0 importlib-metadata 6.0.0 importlib-resources 5.10.2 imutils 0.5.4 inflect 2.1.0 inquirer 3.1.2 intel-openmp 2023.0.0 intervaltree 2.1.0 ipykernel 5.3.4 ipython 7.9.0 ipython-genutils 0.2.0 ipython-sql 0.3.9 ipywidgets 7.7.1 itsdangerous 2.1.2 jax 0.3.25 jaxlib 0.3.25+cuda11.cudnn805 jieba 0.42.1 Jinja2 2.11.3 joblib 1.2.0 jpeg4py 0.1.4 jsonschema 4.3.3 jupyter-client 6.1.12 jupyter-console 6.1.0 jupyter_core 5.1.3 jupyterlab-widgets 3.0.5 kaggle 1.5.12 kapre 0.3.7 keras 2.9.0 Keras-Preprocessing 1.1.2 keras-vis 0.4.1 kiwisolver 1.4.4 korean-lunar-calendar 0.3.1 langcodes 3.3.0 libclang 15.0.6.1 librosa 0.8.1 lightgbm 2.2.3 lightning 1.9.0 lightning-cloud 0.5.19 lightning-utilities 0.5.0 llvmlite 0.39.1 lmdb 0.99 locket 1.0.0 logical-unification 0.4.5 LunarCalendar 0.0.9 lxml 4.9.2 Markdown 3.4.1 markdown-it-py 2.1.0 MarkupSafe 2.0.1 marshmallow 3.19.0 matplotlib 3.2.2 matplotlib-venn 0.11.7 mdurl 0.1.2 miniKanren 1.0.3 missingno 0.5.1 mistune 0.8.4 mizani 0.7.3 mkl 2019.0 mlxtend 0.14.0 more-itertools 9.0.0 moviepy 0.2.3.5 mpmath 1.2.1 msgpack 1.0.4 multidict 6.0.4 multipledispatch 0.6.0 multitasking 0.0.11 murmurhash 1.0.9 music21 5.5.0 natsort 5.5.0 nbconvert 5.6.1 nbformat 5.7.1 netCDF4 1.6.2 networkx 3.0 nibabel 3.0.2 nltk 3.7 notebook 5.7.16 numba 0.56.4 numexpr 2.8.4 numpy 1.21.6 nvidia-cublas-cu11 11.10.3.66 nvidia-cuda-nvrtc-cu11 11.7.99 nvidia-cuda-runtime-cu11 11.7.99 nvidia-cudnn-cu11 8.5.0.96 oauth2client 4.1.3 oauthlib 3.2.2 okgrade 0.4.3 opencv-contrib-python 4.6.0.66 opencv-python 4.6.0.66 opencv-python-headless 4.7.0.68 opendatasets 0.1.22 openpyxl 3.0.10 opt-einsum 3.3.0 ordered-set 4.1.0 orjson 3.8.5 osqp 0.6.2.post0 packaging 21.3 palettable 3.3.0 pandas 1.3.5 pandas-datareader 0.9.0 pandas-gbq 0.17.9 pandas-profiling 1.4.1 pandocfilters 1.5.0 panel 0.12.1 param 1.12.3 parso 0.8.3 partd 1.3.0 pastel 0.2.1 pathlib 1.0.1 pathtools 0.1.2 pathy 0.10.1 patsy 0.5.3 pep517 0.13.0 pexpect 4.8.0 pickleshare 0.7.5 Pillow 7.1.2 pip 22.0.4 pip-tools 6.6.2 platformdirs 2.6.2 plotly 5.5.0 plotnine 0.8.0 pluggy 0.7.1 pooch 1.6.0 portpicker 1.3.9 prefetch-generator 1.0.3 preshed 3.0.8 prettytable 3.6.0 progressbar2 3.38.0 prometheus-client 0.15.0 promise 2.3 prompt-toolkit 2.0.10 prophet 1.1.1 proto-plus 1.22.2 protobuf 3.19.6 psutil 5.4.8 psycopg2 2.9.5 ptyprocess 0.7.0 py 1.11.0 pyarrow 9.0.0 pyasn1 0.4.8 pyasn1-modules 0.2.8 pycocotools 2.0.6 pycparser 2.21 pyct 0.4.8 pydantic 1.10.4 pydata-google-auth 1.5.0 pydot 1.3.0 pydot-ng 2.0.0 pydotplus 2.0.2 PyDrive 1.3.1 pyemd 0.5.1 pyerfa 2.0.0.1 Pygments 2.6.1 PyGObject 3.36.0 PyJWT 2.6.0 pylev 1.4.0 pymc 4.1.4 PyMeeus 0.5.12 pymongo 4.3.3 pymystem3 0.2.0 PyOpenGL 3.1.6 pyparsing 3.0.9 pyrsistent 0.19.3 pysimdjson 3.2.0 PySocks 1.7.1 pystan 3.3.0 pytest 3.6.4 python-apt 2.0.1 python-dateutil 2.8.2 python-dotenv 0.21.1 python-editor 1.0.4 python-louvain 0.16 python-multipart 0.0.5 python-slugify 7.0.0 python-utils 3.4.5 pytorch-lightning 1.9.0 pytz 2022.7 pyviz-comms 2.2.1 PyWavelets 1.4.1 PyYAML 6.0 pyzmq 23.2.1 qdldl 0.1.5.post2 qudida 0.0.4 readchar 4.0.3 regex 2022.6.2 requests 2.25.1 requests-oauthlib 1.3.1 requests-unixsocket 0.2.0 resampy 0.4.2 rfc3986 1.5.0 rich 13.2.0 rpy2 3.5.5 rsa 4.9 scikit-image 0.18.3 scikit-learn 1.0.2 scipy 1.7.3 screen-resolution-extra 0.0.0 scs 3.2.2 seaborn 0.11.2 Send2Trash 1.8.0 sentry-sdk 1.13.0 setproctitle 1.3.2 setuptools 57.4.0 setuptools-git 1.2 shapely 2.0.0 six 1.15.0 sklearn-pandas 1.8.0 smart-open 6.3.0 smmap 5.0.0 sniffio 1.3.0 snowballstemmer 2.2.0 sortedcontainers 2.4.0 soundfile 0.11.0 soupsieve 2.3.2.post1 spacy 3.4.4 spacy-legacy 3.0.11 spacy-loggers 1.0.4 Sphinx 3.5.4 sphinxcontrib.applehelp 1.0.3 sphinxcontrib-devhelp 1.0.2 sphinxcontrib-htmlhelp 2.0.0 sphinxcontrib-jsmath 1.0.1 sphinxcontrib-qthelp 1.0.3 sphinxcontrib-serializinghtml 1.1.5 SQLAlchemy 1.4.46 sqlparse 0.4.3 srsly 2.4.5 starlette 0.22.0 starsessions 1.3.0 statsmodels 0.12.2 sympy 1.7.1 tables 3.7.0 tabulate 0.8.10 tblib 1.7.0 tenacity 8.1.0 tensorboard 2.9.1 tensorboard-data-server 0.6.1 tensorboard-plugin-wit 1.8.1 tensorflow 2.9.2 tensorflow-datasets 4.8.1 tensorflow-estimator 2.9.0 tensorflow-gcs-config 2.9.1 tensorflow-hub 0.12.0 tensorflow-io-gcs-filesystem 0.29.0 tensorflow-metadata 1.12.0 tensorflow-probability 0.17.0 termcolor 2.2.0 terminado 0.13.3 testpath 0.6.0 text-unidecode 1.3 textblob 0.15.3 thinc 8.1.6 threadpoolctl 3.1.0 tifffile 2022.10.10 timm 0.6.12 toml 0.10.2 tomli 2.0.1 toolz 0.12.0 torch 1.13.0 torch-xla 1.13 torchaudio 0.13.1+cu116 torchmetrics 0.11.0 torchsummary 1.5.1 torchtext 0.14.1 torchvision 0.14.0 tornado 6.0.4 tqdm 4.64.1 traitlets 5.7.1 tweepy 3.10.0 typeguard 2.7.1 typer 0.7.0 typing_extensions 4.4.0 tzlocal 1.5.1 ujson 5.7.0 uritemplate 3.0.1 urllib3 1.26.14 uvicorn 0.20.0 uvloop 0.17.0 vega-datasets 0.9.0 wandb 0.13.9 wasabi 0.10.1 watchfiles 0.18.1 wcwidth 0.2.5 webargs 8.2.0 webencodings 0.5.1 websocket-client 1.4.2 websockets 10.4 Werkzeug 1.0.1 wheel 0.38.4 widgetsnbextension 3.6.1 wordcloud 1.8.2.2 wrapt 1.14.1 xarray 2022.12.0 xarray-einstats 0.4.0 xgboost 0.90 xkit 0.0.0 xlrd 1.2.0 xlwt 1.3.0 yarl 1.8.2 yellowbrick 1.5 zict 2.2.0 zipp 3.11.0 ```

More info

No response

cc @JackCaoG @steventk-g @Liyang90

awaelchli commented 1 year ago

Thanks for reporting @mario-dg. Seems that our requirement check doesn't work anymore. Under the hood, we basically call this:

import pkg_resources
pkg_resources.require("torch_xla")

and that produces the ContextualVersionConflict issue. However, this works:

from lightning_utilities.core.imports import module_available
module_available("torch_xla")

The reason for this is already listed in pip's dependency resolver (when you run the pip install command you listed):

ERROR: pip's dependency resolver does not currently take into account all the packages that are installed. This behaviour is the source of the following dependency conflicts. earthengine-api 0.1.335 requires google-api-python-client>=1.12.1, but you have google-api-python-client 1.8.0 which is incompatible.

This means that if the google-api-python-client package cannot be installed without conflicts, our import test will fail.

@carmocca @Borda do you have advice? Should we fall back to the module_available check?

carmocca commented 1 year ago

We can change the function, but this is ultimately an issue with this "earthengine-api" library. @mario-dg do you use that library? Are you able to drop it?

mario-dg commented 1 year ago

Thanks for the help so far. No, I don't use nor need that library. I uninstalled it using

pip uninstall earthengine-api

This still doesn't fix the issue. I verified that google-api-python-client is running the correct version (1.8.0) needed for cloud-tpu-client. Am I missing a step?

awaelchli commented 1 year ago

@carmocca This package is a part of the dependencies of the install command mentioned above (torch and torchxla). I tested it in colab.