Closed PabloVD closed 3 months ago
A couple of updates on my issue:
import lightning as L
print("Before instantiate Trainer") trainer = L.Trainer() print("After instantiate Trainer")
- The same issue also occurs in a different machine, a remote server with Ubuntu 20.04, even with the above super simple example. I have tried with different versions of torch and lightning, and happens the same in all of them.
Does anybody know what is going on?
Another update: the program does not get to output the info regarding available GPU, TPU etc, so it freezes before that. To check when exactly, I put some prints inside the lightning.Trainer
init and I found that it gets stuck just in the line self._accelerator_connector = _AcceleratorConnector
, so it may be causing the issue, but not sure which is exactly the problem.
I faced similar issue. It actually get freeze for thread lock to release which doesn't exist. After downgrading version of Python from 3.11 to 3.9 or 3.10, Trainer stopped freezing.
Check if this helps.
Yes, seems that using python 3.10 it does not freeze anymore. Thanks for the answer!
The code freezes (and then should crash) because it is using num_workers>0
for multiprocessing, but the script does not guard the entry point with if __name__ == "__main__"
which is a requirement for multiprocessing here.
Bug description
I can run once a training script with pytorch-lightning. However, after the training finishes, if train to run it again, the code freezes when the
L.Trainer
is instantiated. There are no error messages.Only if I shutdown and restart, I can run it once again, but then the problem persist for the next time.
This happens to me with different codes, even in the "lightning in 15 minutes" example.
What version are you seeing the problem on?
v2.2
How to reproduce the bug
Error messages and logs
There are no error messages
Environment
Current environment
* CUDA: - GPU: - NVIDIA GeForce RTX 3080 Laptop GPU - available: True - version: 12.1 * Lightning: - denoising-diffusion-pytorch: 1.5.4 - ema-pytorch: 0.2.1 - lightning-utilities: 0.11.2 - pytorch-fid: 0.3.0 - pytorch-lightning: 2.2.2 - torch: 2.2.2 - torchaudio: 2.2.2 - torchmetrics: 1.0.0 - torchvision: 0.17.2 * Packages: - absl-py: 1.4.0 - accelerate: 0.17.1 - addict: 2.4.0 - aiohttp: 3.8.3 - aiosignal: 1.2.0 - antlr4-python3-runtime: 4.9.3 - anyio: 3.6.1 - appdirs: 1.4.4 - argon2-cffi: 21.3.0 - argon2-cffi-bindings: 21.2.0 - array-record: 0.4.0 - arrow: 1.2.3 - astropy: 5.2.1 - asttokens: 2.0.8 - astunparse: 1.6.3 - async-timeout: 4.0.2 - attrs: 23.1.0 - auditwheel: 5.4.0 - babel: 2.10.3 - backcall: 0.2.0 - beautifulsoup4: 4.11.1 - bleach: 5.0.1 - blinker: 1.6.2 - bqplot: 0.12.40 - branca: 0.6.0 - build: 1.2.1 - cachetools: 5.2.0 - carla: 0.9.14 - certifi: 2024.2.2 - cffi: 1.15.1 - chardet: 5.1.0 - charset-normalizer: 2.1.1 - click: 8.1.3 - click-plugins: 1.1.1 - cligj: 0.7.2 - cloudpickle: 3.0.0 - cmake: 3.26.1 - colossus: 1.3.1 - colour: 0.1.5 - contourpy: 1.0.7 - cycler: 0.11.0 - cython: 0.29.32 - dacite: 1.8.1 - dask: 2023.3.1 - dataclass-array: 1.4.1 - debugpy: 1.6.3 - decorator: 4.4.2 - deepspeed: 0.7.2 - defusedxml: 0.7.1 - denoising-diffusion-pytorch: 1.5.4 - deprecation: 2.1.0 - dill: 0.3.6 - distlib: 0.3.6 - dm-tree: 0.1.8 - docker-pycreds: 0.4.0 - docstring-parser: 0.15 - einops: 0.6.0 - einsum: 0.3.0 - ema-pytorch: 0.2.1 - etils: 1.3.0 - exceptiongroup: 1.2.0 - executing: 1.0.0 - farama-notifications: 0.0.4 - fastjsonschema: 2.16.1 - filelock: 3.8.0 - fiona: 1.9.3 - flask: 2.3.3 - flatbuffers: 24.3.25 - folium: 0.14.0 - fonttools: 4.37.1 - frozenlist: 1.3.1 - fsspec: 2022.8.2 - future: 1.0.0 - fvcore: 0.1.5.post20221221 - gast: 0.4.0 - gdown: 4.7.1 - geojson: 3.0.1 - geopandas: 0.12.2 - gitdb: 4.0.11 - gitpython: 3.1.43 - google-auth: 2.16.2 - google-auth-oauthlib: 0.4.6 - google-pasta: 0.2.0 - googleapis-common-protos: 1.63.0 - googledrivedownloader: 0.4 - gputil: 1.4.0 - gpxpy: 1.5.0 - grpcio: 1.62.1 - gunicorn: 20.0.4 - gym: 0.26.2 - gym-notices: 0.0.8 - gymnasium: 0.28.1 - h5py: 3.7.0 - haversine: 2.8.0 - hdf5plugin: 4.1.1 - hjson: 3.1.0 - humanfriendly: 10.0 - idna: 3.6 - imageio: 2.31.3 - imageio-ffmpeg: 0.4.7 - immutabledict: 2.2.0 - importlib-metadata: 4.12.0 - importlib-resources: 6.1.0 - imutils: 0.5.4 - invertedai: 0.0.8.post1 - iopath: 0.1.10 - ipyevents: 2.0.2 - ipyfilechooser: 0.6.0 - ipykernel: 6.15.3 - ipyleaflet: 0.17.4 - ipython: 8.5.0 - ipython-genutils: 0.2.0 - ipytree: 0.2.2 - ipywidgets: 8.0.2 - itsdangerous: 2.1.2 - jax-jumpy: 1.0.0 - jedi: 0.18.1 - jinja2: 3.1.2 - joblib: 1.4.0 - jplephem: 2.19 - json5: 0.9.10 - jsonargparse: 4.15.0 - jsonschema: 4.19.1 - jsonschema-specifications: 2023.7.1 - jstyleson: 0.0.2 - julia: 0.6.1 - jupyter: 1.0.0 - jupyter-client: 7.3.5 - jupyter-console: 6.4.4 - jupyter-core: 4.11.1 - jupyter-packaging: 0.12.3 - jupyter-server: 1.18.1 - jupyterlab: 3.4.7 - jupyterlab-pygments: 0.2.2 - jupyterlab-server: 2.15.1 - jupyterlab-widgets: 3.0.3 - keras: 2.11.0 - kiwisolver: 1.4.4 - lanelet2: 1.2.1 - lark: 1.1.9 - lazy-loader: 0.2 - leafmap: 0.27.0 - libclang: 14.0.6 - lightning-utilities: 0.11.2 - lit: 16.0.0 - llvmlite: 0.39.1 - locket: 1.0.0 - lunarsky: 0.2.1 - lxml: 4.9.1 - lz4: 4.3.3 - markdown: 3.4.1 - markdown-it-py: 2.2.0 - markupsafe: 2.1.1 - matplotlib: 3.6.1 - matplotlib-inline: 0.1.6 - mdurl: 0.1.2 - mistune: 2.0.4 - moviepy: 1.0.3 - mpi4py: 3.1.3 - mpmath: 1.3.0 - msgpack: 1.0.8 - multidict: 6.0.2 - munch: 2.5.0 - natsort: 8.2.0 - nbclassic: 0.4.3 - nbclient: 0.6.8 - nbconvert: 7.0.0 - nbformat: 5.5.0 - nest-asyncio: 1.5.5 - networkx: 2.8.6 - ninja: 1.10.2.3 - notebook: 6.4.12 - notebook-shim: 0.1.0 - numba: 0.56.4 - numpy: 1.24.4 - nvidia-cublas-cu11: 11.10.3.66 - nvidia-cublas-cu12: 12.1.3.1 - nvidia-cuda-cupti-cu11: 11.7.101 - nvidia-cuda-cupti-cu12: 12.1.105 - nvidia-cuda-nvrtc-cu11: 11.7.99 - nvidia-cuda-nvrtc-cu12: 12.1.105 - nvidia-cuda-runtime-cu11: 11.7.99 - nvidia-cuda-runtime-cu12: 12.1.105 - nvidia-cudnn-cu11: 8.5.0.96 - nvidia-cudnn-cu12: 8.9.2.26 - nvidia-cufft-cu11: 10.9.0.58 - nvidia-cufft-cu12: 11.0.2.54 - nvidia-curand-cu11: 10.2.10.91 - nvidia-curand-cu12: 10.3.2.106 - nvidia-cusolver-cu11: 11.4.0.1 - nvidia-cusolver-cu12: 11.4.5.107 - nvidia-cusparse-cu11: 11.7.4.91 - nvidia-cusparse-cu12: 12.1.0.106 - nvidia-nccl-cu11: 2.14.3 - nvidia-nccl-cu12: 2.19.3 - nvidia-nvjitlink-cu12: 12.4.127 - nvidia-nvtx-cu11: 11.7.91 - nvidia-nvtx-cu12: 12.1.105 - oauthlib: 3.2.2 - omegaconf: 2.3.0 - open-humans-api: 0.2.9 - opencv-python: 4.6.0.66 - openexr: 1.3.9 - opt-einsum: 3.3.0 - osmnx: 1.2.2 - p5py: 1.0.0 - packaging: 21.3 - pandas: 1.5.3 - pandocfilters: 1.5.0 - parso: 0.8.3 - partd: 1.4.1 - pep517: 0.13.0 - pickleshare: 0.7.5 - pillow: 9.2.0 - pint: 0.21.1 - pip: 24.0 - pkgconfig: 1.5.5 - pkgutil-resolve-name: 1.3.10 - platformdirs: 2.5.2 - plotly: 5.13.1 - plyfile: 0.8.1 - portalocker: 2.8.2 - powerbox: 0.7.1 - prettymapp: 0.1.0 - proglog: 0.1.10 - prometheus-client: 0.14.1 - promise: 2.3 - prompt-toolkit: 3.0.31 - protobuf: 3.19.6 - psutil: 5.9.2 - ptyprocess: 0.7.0 - pure-eval: 0.2.2 - py-cpuinfo: 8.0.0 - pyarrow: 10.0.0 - pyasn1: 0.4.8 - pyasn1-modules: 0.2.8 - pycocotools: 2.0 - pycosat: 0.6.3 - pycparser: 2.21 - pydantic: 1.10.9 - pydeprecate: 0.3.1 - pydub: 0.25.1 - pyelftools: 0.30 - pyerfa: 2.0.0.1 - pyfftw: 0.13.1 - pygame: 2.1.2 - pygments: 2.13.0 - pylians: 0.7 - pyparsing: 3.0.9 - pyproj: 3.5.0 - pyproject-hooks: 1.0.0 - pyquaternion: 0.9.9 - pyrsistent: 0.18.1 - pyshp: 2.3.1 - pysocks: 1.7.1 - pysr: 0.16.3 - pystac: 1.8.4 - pystac-client: 0.7.5 - python-box: 7.1.1 - python-dateutil: 2.8.2 - pytorch-fid: 0.3.0 - pytorch-lightning: 2.2.2 - pytz: 2022.2.1 - pywavelets: 1.4.1 - pyyaml: 6.0 - pyzmq: 23.2.1 - qtconsole: 5.3.2 - qtpy: 2.2.0 - ray: 2.10.0 - referencing: 0.30.2 - requests: 2.31.0 - requests-oauthlib: 1.3.1 - rich: 13.3.4 - rpds-py: 0.10.3 - rsa: 4.9 - rtree: 1.0.1 - ruamel.yaml: 0.17.21 - ruamel.yaml.clib: 0.2.7 - scikit-build-core: 0.8.2 - scikit-image: 0.20.0 - scikit-learn: 1.2.2 - scipy: 1.8.1 - scooby: 0.7.4 - seaborn: 0.12.2 - send2trash: 1.8.0 - sentry-sdk: 1.44.1 - setproctitle: 1.3.3 - setuptools: 67.6.0 - shapely: 1.8.0 - shellingham: 1.5.4 - six: 1.16.0 - sklearn: 0.0.post1 - smmap: 5.0.1 - sniffio: 1.3.0 - soupsieve: 2.3.2.post1 - spiceypy: 6.0.0 - stack-data: 0.5.0 - stravalib: 1.4 - swagger-client: 1.0.0 - sympy: 1.11.1 - tabulate: 0.9.0 - taichi: 1.5.0 - tenacity: 8.2.3 - tensorboard: 2.11.2 - tensorboard-data-server: 0.6.1 - tensorboard-plugin-wit: 1.8.1 - tensorboardx: 2.6.2.2 - tensorflow: 2.11.0 - tensorflow-addons: 0.21.0 - tensorflow-datasets: 4.9.0 - tensorflow-estimator: 2.11.0 - tensorflow-graphics: 2021.12.3 - tensorflow-io-gcs-filesystem: 0.29.0 - tensorflow-metadata: 1.13.0 - tensorflow-probability: 0.19.0 - termcolor: 2.1.1 - terminado: 0.15.0 - threadpoolctl: 3.1.0 - tifffile: 2023.3.21 - timm: 0.4.12 - tinycss2: 1.1.1 - toml: 0.10.2 - tomli: 2.0.1 - tomlkit: 0.11.4 - toolz: 0.12.1 - torch: 2.2.2 - torchaudio: 2.2.2 - torchmetrics: 1.0.0 - torchvision: 0.17.2 - tornado: 6.2 - tqdm: 4.66.2 - tr: 1.0.0.2 - trafficgen: 0.0.0 - traitlets: 5.4.0 - traittypes: 0.2.1 - trimesh: 4.3.0 - triton: 2.2.0 - typeguard: 2.13.3 - typer: 0.12.2 - typing-extensions: 4.11.0 - urllib3: 1.26.15 - virtualenv: 20.16.5 - visu3d: 1.5.1 - wandb: 0.16.5 - waymo-open-dataset-tf-2-11-0: 1.6.1 - wcwidth: 0.2.5 - webencodings: 0.5.1 - websocket-client: 1.4.1 - werkzeug: 2.3.7 - wheel: 0.37.1 - whitebox: 2.3.1 - whiteboxgui: 2.3.0 - widgetsnbextension: 4.0.3 - wrapt: 1.14.1 - xyzservices: 2023.7.0 - yacs: 0.1.8 - yapf: 0.30.0 - yarl: 1.8.1 - zipp: 3.8.1 * System: - OS: Linux - architecture: - 64bit - ELF - processor: x86_64 - python: 3.8.19 - release: 5.15.0-102-generic - version: #112~20.04.1-Ubuntu SMP Thu Mar 14 14:28:24 UTC 2024More info
No response