Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.21k stars 3.37k forks source link

undefined symbol when importing PyTorch Lightning after installing stable baselines3 #15083

Closed n-balla closed 1 year ago

n-balla commented 2 years ago

Bug description

I am not able to use or import PyTorch Lightning after installing Stable Baselines3.

I get the error:

ImportError: /home/alballns/miniconda3/envs/kdn_env/lib/python3.9/site-packages/torchtext/_torchtext.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtERKSt10shared_ptrIS0_EPSo

I tried to re-install pytorch_lightning, PyTorch, torchtext, and made sure their versions are all compatible.

But still couldn't fix the issue.

Please help! Thanks!

How to reproduce the bug

No response

Error messages and logs


here is a part of the track trace:

`----> 8 import pytorch_lightning as pl

File ~/miniconda3/envs/kdn_env/lib/python3.9/site-packages/pytorch_lightning/__init__.py:20, in <module>
     17 _PACKAGE_ROOT = os.path.dirname(__file__)
     18 _PROJECT_ROOT = os.path.dirname(_PACKAGE_ROOT)
---> 20 from pytorch_lightning.callbacks import Callback  # noqa: E402

File ~/miniconda3/envs/kdn_env/lib/python3.9/site-packages/pytorch_lightning/callbacks/__init__.py:14, in <module>
      1 # Copyright The PyTorch Lightning team.
      2 #
      3 # Licensed under the Apache License, Version 2.0 (the "License");
   (...)
     12 # See the License for the specific language governing permissions and
     13 # limitations under the License.
---> 14 from pytorch_lightning.callbacks.base import Callback

[ ... many lines here ... ]

File ~/miniconda3/envs/kdn_env/lib/python3.9/site-packages/torchtext/vocab/vocab_factory.py:4, in <module>
      2 from typing import Dict, Iterable, Optional, List
      3 from collections import Counter, OrderedDict
----> 4 from torchtext._torchtext import (
      5     Vocab as VocabPybind,
      6 )
      9 def vocab(ordered_dict: Dict, min_freq: int = 1) -> Vocab:
     10     r"""Factory method for creating a vocab object which maps tokens to indices.
     11 
     12     Note that the ordering in which key value pairs were inserted in the `ordered_dict` will be respected when building the vocab.
   (...)
     42         >>> v2['out of vocab'] is v2[unk_token] #prints True
     43     """

ImportError: /home/alballns/miniconda3/envs/kdn_env/lib/python3.9/site-packages/torchtext/_torchtext.so: undefined symbol: _ZNK3c104Type14isSubtypeOfExtERKSt10shared_ptrIS0_EPSo`

Environment


#- Lightning Component (e.g. Trainer, LightningModule, LightningApp, LightningWork, LightningFlow):
#- PyTorch Lightning Version (e.g., 1.5.0):
#- Lightning App Version (e.g., 0.5.2):
#- PyTorch Version (e.g., 1.10):
#- Python version (e.g., 3.9):
#- OS (e.g., Linux):
#- CUDA/cuDNN version:
#- GPU models and configuration:
#- How you installed Lightning(`conda`, `pip`, source):
#- Running environment of LightningApp (e.g. local, cloud):

More info

No response

otaj commented 2 years ago

Hi, @n-balla, can you please share with us the environment you use so that we can help you debug further? The easiest way to do so would be to run python requirements/collect_env_details.py

Thanks a lot!

n-balla commented 2 years ago

Thank you for your reply! @otaj

Here is the environment details:

`* CUDA:

I tried several torch/ torchtext versions trying to solve the issue, but the problem couldn't be solved yet.

otaj commented 2 years ago

@n-balla, it looks, like your environment is quite broken. stable-baselines3==1.6.1 wants to have torch>=1.11, I know, that torchvision version are quite tightly linked to particular torch version and I expect it's the same for torchtext, plus it seems your versions of torch and torchtext are quite old (and I think torch 1.8.0 is unsupported by now, but I'm not 100% sure about it). It appears to me, you got to that point by "reusing" an old environment, to which you installed a new package (in this case probably stable-baselines3 since that is quite recent), while not specifying --upgrade flag for pip.

In order to check whether your environment is healthy, you can run pip check.

However, if you don't need a specific of any of these libraries, I just tried to run pip install torchtext stable-baselines3 pytorch-lightning --extra-index-url https://download.pytorch.org/whl/cu116 in a fresh, python 3.9.14 environment and everything seems to be wokring as expected. I have a newer version of CUDA, but that seems to be the only difference.

Please, let me know if this helps!

My environment details: ``` * CUDA: - GPU: - NVIDIA T1200 Laptop GPU - available: True - version: 11.6 * Lightning: - pytorch-lightning: 1.7.7 - torch: 1.12.1+cu116 - torchmetrics: 0.10.0 - torchtext: 0.13.1 * Packages: - absl-py: 1.2.0 - aiohttp: 3.8.3 - aiosignal: 1.2.0 - async-timeout: 4.0.2 - attrs: 22.1.0 - cachetools: 5.2.0 - certifi: 2022.9.24 - charset-normalizer: 2.1.1 - cloudpickle: 2.2.0 - contourpy: 1.0.5 - cycler: 0.11.0 - fonttools: 4.37.4 - frozenlist: 1.3.1 - fsspec: 2022.8.2 - google-auth: 2.12.0 - google-auth-oauthlib: 0.4.6 - grpcio: 1.49.1 - gym: 0.21.0 - idna: 3.4 - importlib-metadata: 4.13.0 - kiwisolver: 1.4.4 - markdown: 3.4.1 - markupsafe: 2.1.1 - matplotlib: 3.6.1 - multidict: 6.0.2 - numpy: 1.23.3 - oauthlib: 3.2.1 - packaging: 21.3 - pandas: 1.5.0 - pillow: 9.2.0 - pip: 22.0.4 - protobuf: 3.19.6 - pyasn1: 0.4.8 - pyasn1-modules: 0.2.8 - pydeprecate: 0.3.2 - pyparsing: 3.0.9 - python-dateutil: 2.8.2 - pytorch-lightning: 1.7.7 - pytz: 2022.4 - pyyaml: 6.0 - requests: 2.28.1 - requests-oauthlib: 1.3.1 - rsa: 4.9 - setuptools: 58.1.0 - six: 1.16.0 - stable-baselines3: 1.6.2 - tensorboard: 2.10.1 - tensorboard-data-server: 0.6.1 - tensorboard-plugin-wit: 1.8.1 - torch: 1.12.1+cu116 - torchmetrics: 0.10.0 - torchtext: 0.13.1 - tqdm: 4.64.1 - typing-extensions: 4.4.0 - urllib3: 1.26.12 - werkzeug: 2.2.2 - wheel: 0.37.1 - yarl: 1.8.1 - zipp: 3.9.0 * System: - OS: Linux - architecture: - 64bit - ELF - processor: - python: 3.9.13 - version: #1 ZEN SMP PREEMPT_DYNAMIC Tue, 04 Oct 2022 14:37:01 +0000 ```
n-balla commented 2 years ago

Thank you @otaj !

This was only one of the environments I tried.

Here is another one, with a newer torch and torchtext, and still gave the exact same error:

`* CUDA:

otaj commented 2 years ago

Huh, that is admittedly weird and definitely should not happen. I don't think we have any imports of torchtext in our imports, at least on 1.7.7 version. Would you be able to share with me the steps you made in order to obtain this environment, i.e. which packages you installed explicitly and in which order?

I have to say, I have a hard time replicating this. I just run pip install absl-py==1.2.0 aiohttp==3.8.3 aiosignal==1.2.0 ale-py==0.7.4 argon2-cffi==21.3.0 argon2-cffi-bindings==21.2.0 asttokens==2.0.5 async-timeout==4.0.2 attrs==22.1.0 autorom==0.4.2 autorom.accept-rom-license==0.4.2 backcall==0.2.0 beautifulsoup4==4.11.1 bleach==5.0.0 cachetools==5.2.0 certifi==2022.9.24 charset-normalizer==2.1.1 click==8.1.3 cloudpickle==2.2.0 commonmark==0.9.1 contourpy==1.0.5 cycler==0.11.0 debugpy==1.6.0 decorator==5.1.1 defusedxml==0.7.1 entrypoints==0.4 executing==0.8.3 fastjsonschema==2.15.3 fonttools==4.37.4 frozenlist==1.3.1 fsspec==2022.8.2 google-auth==2.12.0 google-auth-oauthlib==0.4.6 grpcio==1.49.1 gym==0.21.0 gym-notices==0.0.8 idna==3.4 importlib-metadata==4.13.0 importlib-resources==5.10.0 ipykernel==6.13.1 ipython==8.4.0 ipython-genutils==0.2.0 ipywidgets==7.7.0 jedi==0.18.1 jinja2==3.1.2 jsonschema==4.6.0 jupyter==1.0.0 jupyter-client==7.3.4 jupyter-console==6.4.3 jupyter-core==4.10.0 jupyterlab-pygments==0.2.2 jupyterlab-widgets==1.1.0 kiwisolver==1.4.4 markdown==3.4.1 markupsafe==2.1.1 matplotlib==3.6.1 mistune==0.8.4 multidict==6.0.2 nbclient==0.6.4 nbconvert==6.5.0 nbformat==5.4.0 nest-asyncio==1.5.5 notebook==6.4.12 numpy==1.23.3 oauthlib==3.2.1 opencv-python==4.6.0.66 packaging==21.3 pandas==1.5.0 pandocfilters==1.5.0 parso==0.8.3 pexpect==4.8.0 pickleshare==0.7.5 pillow==9.2.0 pip==22.2.2 prometheus-client==0.14.1 prompt-toolkit==3.0.29 protobuf==3.19.6 psutil==5.9.2 pure-eval==0.2.2 pyasn1==0.4.8 pyasn1-modules==0.2.8 pydeprecate==0.3.2 pygments==2.12.0 pyparsing==3.0.9 pyrsistent==0.18.1 python-dateutil==2.8.2 pytorch-lightning==1.7.7 pytz==2022.4 pyyaml==6.0 pyzmq==23.1.0 qtconsole==5.3.1 qtpy==2.1.0 requests==2.28.1 requests-oauthlib==1.3.1 rich==12.6.0 rsa==4.9 send2trash==1.8.0 setuptools==65.4.1 six==1.16.0 soupsieve==2.3.2.post1 stable-baselines3==1.6.2 stack-data==0.2.0 tensorboard==2.10.1 tensorboard-data-server==0.6.1 tensorboard-plugin-wit==1.8.1 tinycss2==1.1.1 torch==1.12.1 torchmetrics==0.10.0 torchtext==0.13.1 tqdm==4.64.1 typing-extensions==4.4.0 urllib3==1.26.12 wcwidth==0.2.5 webencodings==0.5.1 werkzeug==2.2.2 wheel==0.37.1 widgetsnbextension==3.6.0 yarl==1.8.1 zipp==3.9.0 to install exactly the versions listed in your last environment, pip check didn't complain and import pytorch_lightning had no issues as well

EvanZ commented 2 years ago

I am getting a similar error:

Exception has occurred: OSError /home/ec2-user/.local/lib/python3.7/site-packages/torchtext/lib/libtorchtext.so: undefined symbol: _ZN5torch6detail10class_baseC2ERKSsS3_SsRKSt9type_infoS6_ File "/home/ec2-user/deep-behavior-embedding/finetune.py", line 8, in <module> from pytorch_lightning.accelerators import accelerator

------------------------- ------------------- ----------------------------------------
absl-py                   1.3.0
aiohttp                   3.8.3
aiosignal                 1.2.0
alembic                   1.8.1
astunparse                1.6.3
async-timeout             4.0.2
asynctest                 0.13.0
attrs                     22.1.0
aws-cfn-bootstrap         2.0
boto3                     1.24.91
botocore                  1.27.91
cachetools                5.2.0
certifi                   2022.9.24
cffi                      1.15.1
charset-normalizer        2.1.1
click                     8.1.3
cloudpickle               2.2.0
cmake                     3.24.1.1
cramjam                   2.5.0
cycler                    0.11.0
databricks-cli            0.17.3
dataclasses               0.6
docker                    6.0.0
docutils                  0.14
entrypoints               0.4
fastparquet               0.8.1
Flask                     2.2.2
fonttools                 4.37.4
frozenlist                1.3.1
fsspec                    2022.8.2
future                    0.18.2
gitdb                     4.0.9
GitPython                 3.1.29
google-auth               2.12.0
google-auth-oauthlib      0.4.6
greenlet                  1.1.3.post0
grpcio                    1.49.1
gunicorn                  20.1.0
idna                      3.4
importlib-metadata        4.13.0
importlib-resources       5.10.0
intel-openmp              2022.2.0
itsdangerous              2.1.2
Jinja2                    3.1.2
jmespath                  1.0.1
joblib                    1.2.0
jsonlines                 3.1.0
kiwisolver                1.4.4
lockfile                  0.11.0
Mako                      1.2.3
Markdown                  3.4.1
MarkupSafe                2.1.1
matplotlib                3.5.3
mkl                       2022.2.0
mkl-include               2022.2.0
mlflow                    1.29.0
multidict                 6.0.2
numpy                     1.21.6
oauthlib                  3.2.1
packaging                 21.3
pandas                    1.3.5
Pillow                    9.2.0
pip                       20.2.2
prometheus-client         0.15.0
prometheus-flask-exporter 0.20.3
protobuf                  3.19.6
psutil                    5.9.2
pyasn1                    0.4.8
pyasn1-modules            0.2.8
pycparser                 2.21
pyDeprecate               0.3.2
PyJWT                     2.5.0
pyparsing                 3.0.9
pystache                  0.5.4
python-daemon             2.2.3
python-dateutil           2.8.2
pytorch-lightning         1.7.0
pytz                      2022.4
PyYAML                    6.0
pyzmq                     24.0.1
querystring-parser        1.2.4
requests                  2.28.1
requests-oauthlib         1.3.1
rsa                       4.9
s3transfer                0.6.0
scikit-learn              1.0.2
scipy                     1.7.3
setuptools                49.1.3
simplejson                3.2.0
six                       1.16.0
sklearn                   0.0
smmap                     5.0.0
SQLAlchemy                1.4.41
sqlparse                  0.4.3
tabulate                  0.9.0
tbb                       2021.7.0
tensorboard               2.10.1
tensorboard-data-server   0.6.1
tensorboard-plugin-wit    1.8.1
threadpoolctl             3.1.0
torch                     1.13.0a0+gitunknown /usr/local/lib64/python3.7/site-packages
torchinfo                 1.7.1
torchmetrics              0.10.0
torchtext                 0.13.1
tqdm                      4.64.1
typing                    3.7.4.3
typing-extensions         4.4.0
urllib3                   1.26.12
websocket-client          1.4.1
Werkzeug                  2.2.2
wheel                     0.37.1
yarl                      1.8.1
zipp                      3.9.0
zmq                       0.0.0

Not sure what the issue is. I tried downgrading from 1.7.7 to 1.7.0 and still got the same error. I'm running Python 3.7.10.

otaj commented 2 years ago

Hi, @EvanZ, I am fairly confident this is a mismatch between torch and torchtext versions. We are importing torchtext to check if it is available in 1.7.* releases and it crashes on import since there is that mismatch.

EvanZ commented 1 year ago

Hey @otaj yes this was indeed our problem as far as I can tell. We upgraded torchtext and it works now.

otaj commented 1 year ago

Perfect, @EvanZ, that is great to hear! I will close this issue now then.