Lightning-AI / pytorch-lightning

Pretrain, finetune ANY AI model of ANY size on multiple GPUs, TPUs with zero code changes.
https://lightning.ai
Apache License 2.0
28.23k stars 3.38k forks source link

TypeError: cannot pickle '_thread.lock' object #17269

Closed leng-yue closed 1 year ago

leng-yue commented 1 year ago

Bug description

In PyTorch lightning 1.x, when we are trying to pickle an unpickable object, it will be automatically ignored. However, in lightning 2.0.1, an edge case will trigger TypeError: cannot pickle '_thread.lock' object in my use cases. Therefore, adding TypeError to the is_picklable function solved this problem.

How to reproduce the bug

import threading
from lightning.pytorch.utilities.parsing import is_picklable

is_picklable(threading.Lock())

Error messages and logs

Traceback (most recent call last):
  File "/home/lengyue/workspace/lightning/test.py", line 4, in <module>
    is_picklable(threading.Lock())
  File "/home/lengyue/workspace/lightning/src/lightning/pytorch/utilities/parsing.py", line 33, in is_picklable
    pickle.dumps(obj)
TypeError: cannot pickle '_thread.lock' object

Environment

Current environment ``` * CUDA: - GPU: - NVIDIA GeForce RTX 3090 - NVIDIA GeForce RTX 3090 - available: True - version: 11.8 * Lightning: - lightning-utilities: 0.8.0 - pytorch-lightning: 2.0.1 - torch: 2.0.0+cu118 - torchaudio: 2.0.1+cu118 - torchcrepe: 0.0.17 - torchmetrics: 0.11.4 - torchvision: 0.15.1+cu118 * Packages: - absl-py: 1.4.0 - addict: 2.4.0 - aiofiles: 23.1.0 - aiohttp: 3.8.4 - aiosignal: 1.3.1 - alabaster: 0.7.13 - altair: 4.2.2 - antlr4-python3-runtime: 4.9.3 - anyio: 3.6.2 - appdirs: 1.4.4 - async-timeout: 4.0.2 - attrs: 22.2.0 - audioread: 3.0.0 - babel: 2.12.1 - beautifulsoup4: 4.12.0 - black: 22.12.0 - blessed: 1.20.0 - brotlipy: 0.7.0 - build: 0.10.0 - cachecontrol: 0.12.11 - cachetools: 5.3.0 - certifi: 2022.12.7 - cffi: 1.15.1 - charset-normalizer: 3.1.0 - cleo: 2.0.1 - click: 8.1.3 - cloudpickle: 2.2.1 - cmake: 3.26.1 - colorama: 0.4.6 - coloredlogs: 15.0.1 - contourpy: 1.0.7 - crashtest: 0.4.1 - cryptography: 39.0.1 - cycler: 0.11.0 - cython: 0.29.34 - decorator: 5.1.1 - demucs: 4.0.0 - deprecated: 1.2.13 - diffq: 0.2.3 - distlib: 0.3.6 - docker-pycreds: 0.4.0 - docutils: 0.19 - dora-search: 0.1.11 - dulwich: 0.21.3 - einops: 0.6.0 - encodec: 0.1.1 - entrypoints: 0.4 - fastapi: 0.88.0 - ffmpeg-python: 0.2.0 - ffmpy: 0.3.0 - filelock: 3.10.6 - fish-audio-preprocess: 0.1.10 - fish-diffusion: 0.1.0 - flask: 2.2.3 - flask-cors: 3.0.10 - flatbuffers: 23.3.3 - flit-core: 3.8.0 - fonttools: 4.39.3 - frozenlist: 1.3.3 - fsspec: 2023.3.0 - furo: 2022.12.7 - future: 0.18.3 - gitdb: 4.0.10 - gitpython: 3.1.31 - gmpy2: 2.1.2 - google-auth: 2.17.1 - google-auth-oauthlib: 1.0.0 - gpustat: 1.0.0 - gradio: 3.24.1 - gradio-client: 0.0.5 - grpcio: 1.53.0 - h11: 0.14.0 - html5lib: 1.1 - httpcore: 0.16.3 - httpx: 0.23.3 - huggingface-hub: 0.13.3 - humanfriendly: 10.0 - idna: 3.4 - imagesize: 1.4.1 - importlib-metadata: 6.0.0 - installer: 0.6.0 - isort: 5.12.0 - itsdangerous: 2.1.2 - jaconv: 0.3.4 - jaraco.classes: 3.2.3 - jinja2: 3.1.2 - joblib: 1.2.0 - jsonschema: 4.17.3 - julius: 0.2.7 - keyring: 23.13.1 - kiwisolver: 1.4.4 - lameenc: 1.4.2 - libf0: 1.0.2 - librosa: 0.9.1 - lightning-utilities: 0.8.0 - linkify-it-py: 2.0.0 - lit: 16.0.0 - livereload: 2.6.3 - llvmlite: 0.39.1 - lockfile: 0.12.2 - loguru: 0.6.0 - markdown: 3.4.3 - markdown-it-py: 2.2.0 - markupsafe: 2.1.1 - matplotlib: 3.7.1 - mdit-py-plugins: 0.3.3 - mdurl: 0.1.2 - memray: 1.7.0 - mkl-fft: 1.3.1 - mkl-random: 1.2.2 - mkl-service: 2.4.0 - mmengine: 0.4.0 - more-itertools: 9.1.0 - mpmath: 1.3.0 - msgpack: 1.0.4 - multidict: 6.0.4 - mypy-extensions: 1.0.0 - myst-parser: 0.18.1 - natsort: 8.3.1 - networkx: 2.8.4 - numba: 0.56.4 - numpy: 1.23.5 - nvidia-ml-py: 11.495.46 - oauthlib: 3.2.2 - omegaconf: 2.3.0 - onnxruntime: 1.14.1 - openai-whisper: 20230124 - opencv-python: 4.7.0.72 - openunmix: 1.2.1 - orjson: 3.8.9 - packaging: 23.0 - pandas: 1.5.3 - pathspec: 0.11.1 - pathtools: 0.1.2 - pillow: 9.5.0 - pip: 23.0.1 - pkginfo: 1.9.6 - platformdirs: 2.6.2 - poetry: 1.4.0 - poetry-core: 1.5.1 - poetry-plugin-export: 1.3.0 - pooch: 1.7.0 - praat-parselmouth: 0.4.3 - protobuf: 4.22.1 - psutil: 5.9.4 - pyasn1: 0.4.8 - pyasn1-modules: 0.2.8 - pycparser: 2.21 - pydantic: 1.10.7 - pydub: 0.25.1 - pygments: 2.14.0 - pykakasi: 2.2.1 - pyloudnorm: 0.1.1 - pyopenssl: 23.0.0 - pyparsing: 3.0.9 - pypinyin: 0.48.0 - pyproject-hooks: 1.0.0 - pyrsistent: 0.19.3 - pysocks: 1.7.1 - pysoundfile: 0.9.0.post1 - python-dateutil: 2.8.2 - python-multipart: 0.0.6 - pytorch-lightning: 2.0.1 - pytz: 2023.3 - pyworld: 0.3.2 - pyyaml: 6.0 - rapidfuzz: 2.13.7 - regex: 2023.3.23 - requests: 2.28.1 - requests-oauthlib: 1.3.1 - requests-toolbelt: 0.10.1 - resampy: 0.4.2 - retrying: 1.3.4 - rfc3986: 1.5.0 - rich: 13.3.3 - richuru: 0.1.1 - rsa: 4.9 - scikit-learn: 1.2.2 - scipy: 1.9.3 - semantic-version: 2.10.0 - sentry-sdk: 1.18.0 - setproctitle: 1.3.2 - setuptools: 67.6.1 - shellingham: 1.5.1 - six: 1.16.0 - smmap: 5.0.0 - sniffio: 1.3.0 - snowballstemmer: 2.2.0 - soundfile: 0.11.0 - soupsieve: 2.3.2.post1 - sphinx: 5.3.0 - sphinx-autobuild: 2021.3.14 - sphinx-basic-ng: 1.0.0b1 - sphinxcontrib-applehelp: 1.0.4 - sphinxcontrib-devhelp: 1.0.2 - sphinxcontrib-htmlhelp: 2.0.1 - sphinxcontrib-jsmath: 1.0.1 - sphinxcontrib-qthelp: 1.0.3 - sphinxcontrib-serializinghtml: 1.1.5 - starlette: 0.22.0 - submitit: 1.4.5 - sympy: 1.11.1 - tensorboard: 2.12.1 - tensorboard-data-server: 0.7.0 - tensorboard-plugin-wit: 1.8.1 - termcolor: 2.2.0 - textgrid: 1.5 - threadpoolctl: 3.1.0 - tokenizers: 0.13.2 - tomli: 2.0.1 - tomlkit: 0.11.6 - toolz: 0.12.0 - torch: 2.0.0+cu118 - torchaudio: 2.0.1+cu118 - torchcrepe: 0.0.17 - torchmetrics: 0.11.4 - torchvision: 0.15.1+cu118 - tornado: 6.2 - tqdm: 4.65.0 - transformers: 4.27.4 - treetable: 0.2.5 - triton: 2.0.0 - trove-classifiers: 2023.2.8 - typing-extensions: 4.4.0 - uc-micro-py: 1.0.1 - urllib3: 1.26.14 - uvicorn: 0.21.1 - virtualenv: 20.19.0 - wandb: 0.13.11 - wcwidth: 0.2.6 - webencodings: 0.5.1 - websockets: 11.0 - werkzeug: 2.2.3 - wheel: 0.40.0 - wrapt: 1.15.0 - yapf: 0.32.0 - yarl: 1.8.2 * System: - OS: Linux - architecture: - 64bit - ELF - processor: x86_64 - python: 3.10.10 - version: #74-Ubuntu SMP Wed Feb 22 14:14:39 UTC 2023 ```

More info

No response

awaelchli commented 1 year ago

@leng-yue I couldn't see this from your error message, what is the object that you are passing into the model? The object that has the threading lock.

leng-yue commented 1 year ago

The config to dump is

{'sampling_rate': 44100, 'mel_channels': 256, 'hidden_size': 256, 'model': {'type': 'DiffSVC', 'diffusion': {'type': 'GaussianDiffusion', 'mel_channels': 256, 'noise_schedule': 'linear', 'timesteps': 1000, 'max_beta': 0.01, 's': 0.008, 'noise_loss': 'smoothed-l1', 'denoiser': {'type': 'WaveNetDenoiser', 'mel_channels': 256, 'd_encoder': 256, 'residual_channels': 512, 'residual_layers': 20, 'dilation_cycle': 4, 'use_linear_bias': True}, 'sampler_interval': 10, 'spec_min': [-5], 'spec_max': [0], 'use_spec_norm': False}, 'text_encoder': {'type': 'NaiveProjectionEncoder', 'input_size': 768, 'output_size': 256}, 'speaker_encoder': {'type': 'NaiveProjectionEncoder', 'input_size': 4, 'output_size': 256, 'use_embedding': True}, 'pitch_encoder': {'type': 'NaiveProjectionEncoder', 'input_size': 1, 'output_size': 256, 'use_embedding': False, 'preprocessing': <function pitch_to_log at 0x7f3d847e95a0>}, 'vocoder': {'type': 'AutoVocoder', 'checkpoint_path': 'logs/AutoVocoder/61b0mhsc/checkpoints/epoch=325-step=340000-valid_loss=0.23.ckpt'}, 'pitch_shift_encoder': {'type': 'NaiveProjectionEncoder', 'input_size': 1, 'output_size': 256, 'use_embedding': False}, 'energy_encoder': {'type': 'NaiveProjectionEncoder', 'input_size': 1, 'output_size': 256, 'use_embedding': False}}, 'LearningRateMonitor': <class 'pytorch_lightning.callbacks.lr_monitor.LearningRateMonitor'>, 'ModelCheckpoint': <class 'pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint'>, 'DDPStrategy': <class 'pytorch_lightning.strategies.ddp.DDPStrategy'>, 'trainer': {'accelerator': 'gpu', 'devices': -1, 'gradient_clip_val': 0.5, 'log_every_n_steps': 10, 'val_check_interval': 5000, 'check_val_every_n_epoch': None, 'max_steps': 300000, 'precision': 16, 'callbacks': [<pytorch_lightning.callbacks.model_checkpoint.ModelCheckpoint object at 0x7f3d847f5b70>, <pytorch_lightning.callbacks.lr_monitor.LearningRateMonitor object at 0x7f3d847f5270>, <pytorch_lightning.callbacks.progress.tqdm_progress.TQDMProgressBar object at 0x7f3d846aad10>, <pytorch_lightning.callbacks.model_summary.ModelSummary object at 0x7f3d846aab60>], 'strategy': <pytorch_lightning.strategies.ddp.DDPStrategy object at 0x7f3d847f4ee0>}, 'process_group_backend': 'nccl', 'LambdaWarmUpCosineScheduler': <class 'fish_diffusion.schedulers.warmup_cosine_scheduler.LambdaWarmUpCosineScheduler'>, 'lambda_func': <fish_diffusion.schedulers.warmup_cosine_scheduler.LambdaWarmUpCosineScheduler object at 0x7f3d847f7130>, 'optimizer': {'type': 'AdamW', 'lr': 1.0, 'weight_decay': 0.01, 'betas': (0.9, 0.98), 'eps': 1e-09}, 'scheduler': {'type': 'LambdaLR', 'lr_lambda': <fish_diffusion.schedulers.warmup_cosine_scheduler.LambdaWarmUpCosineScheduler object at 0x7f3d847f57e0>}, 'dataset': {'train': {'type': 'ConcatDataset', 'datasets': [{'type': 'NaiveSVCPowerDataset', 'path': 'dataset/annasita/train/annasita', 'speaker_id': 0}, {'type': 'NaiveSVCPowerDataset', 'path': 'dataset/annasita/train/aria', 'speaker_id': 3}, {'type': 'NaiveSVCPowerDataset', 'path': 'dataset/annasita/train/opencpop', 'speaker_id': 2}, {'type': 'NaiveSVCPowerDataset', 'path': 'dataset/hanser/train', 'speaker_id': 1}], 'collate_fn': <bound method NaiveDataset.collate_fn of <class 'fish_diffusion.datasets.naive.NaiveSVCPowerDataset'>>}, 'valid': {'type': 'ConcatDataset', 'datasets': [{'type': 'NaiveSVCPowerDataset', 'path': 'dataset/annasita/valid', 'speaker_id': 0}, {'type': 'NaiveSVCPowerDataset', 'path': 'dataset/hanser/valid', 'speaker_id': 1}], 'collate_fn': <bound method NaiveDataset.collate_fn of <class 'fish_diffusion.datasets.naive.NaiveSVCPowerDataset'>>}}, 'dataloader': {'train': {'batch_size': 20, 'shuffle': True, 'num_workers': 2, 'persistent_workers': True}, 'valid': {'batch_size': 2, 'shuffle': False, 'num_workers': 2, 'persistent_workers': True}}, 'Path': <class 'pathlib.Path'>, 'NaiveSVCPowerDataset': <class 'fish_diffusion.datasets.naive.NaiveSVCPowerDataset'>, 'speaker_mapping': {'annasita': 0, 'hanser': 1, 'opencpop': 2, 'aria': 3}, 'preprocessing': {'text_features_extractor': {'type': 'ContentVec', 'output_layer': -1, 'use_projection': False}, 'pitch_extractor': {'type': 'ParselMouthPitchExtractor', 'keep_zeros': False, 'f0_min': 40.0, 'f0_max': 2000.0}, 'energy_extractor': {'type': 'RMSEnergyExtractor'}, 'augmentations': [{'type': 'RandomPitchShifting', 'key_shifts': [-5.0, 5.0], 'probability': 1.5}, {'type': 'RandomTimeStretching', 'factors': [0.8, 1.2], 'probability': 0.75}]}}

In detail, I found that the DDPStrategy caused the TypeError.