CUNY-CL / yoyodyne

Small-vocabulary sequence-to-sequence generation with optional feature conditioning
Apache License 2.0
25 stars 15 forks source link

TQDM Error with multi GPU Transducer #192

Open bonham79 opened 2 weeks ago

bonham79 commented 2 weeks ago

Issue when running multi-gpu training with edit action transducer:

Traceback (most recent call last):                                                                                                   
  File "/home/salamander/anaconda3/envs/sigmorphon2024/bin/yoyodyne-train", line 8, in <module>
    sys.exit(main())
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 390, in main
    model = get_model_from_argparse_args(args, datamodule)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/train.py", line 214, in get_model_from_argparse_args
    return model_cls(
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/transducer.py", line 43, in __init__
    super().__init__(*args, **kwargs)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/lstm.py", line 36, in __init__
    super().__init__(*args, **kwargs)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/yoyodyne/models/base.py", line 155, in __init__
    self.save_hyperparameters(
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/core/mixins/hparams_mixin.py", line 110, in save_hyperparameters
    save_hyperparameters(self, *args, ignore=ignore, frame=frame)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/pytorch_lightning/utilities/parsing.py", line 275, in save_hyperparameters
    obj._hparams_initial = copy.deepcopy(obj._hparams)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 297, in _reconstruct
    value = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 172, in deepcopy
    y = _reconstruct(x, memo, *rv)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 271, in _reconstruct
    state = deepcopy(state, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 146, in deepcopy
    y = copier(x, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 231, in _deepcopy_dict
    y[deepcopy(key, memo)] = deepcopy(value, memo)
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/copy.py", line 161, in deepcopy
    rv = reductor(4)
TypeError: cannot pickle '_io.TextIOWrapper' object
Exception ignored in: <function tqdm.__del__ at 0x7f96d86a6290>
Traceback (most recent call last):
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1148, in __del__
    self.close()
  File "/home/salamander/anaconda3/envs/sigmorphon2024/lib/python3.10/site-packages/tqdm/std.py", line 1267, in close
    if self.disable:
AttributeError: 'tqdm' object has no attribute 'disable'

From what I gather, the TQDM class within the expert module can't be pickled to distribute across multiple GPUs. This is fixed by adding expert to the ignore function when saving hyperparameters, but wanted to get feedback if there was a less 'hacky' way to deal with it.

@kylebgorman thoughts?

kylebgorman commented 2 weeks ago

When something doesn't pickle yet you usually can just give it the necessary methods, but I don't want to hack into TQDM so I think the hacky solution is fine.