Open villmow opened 3 years ago
Here is a complete example to reproduce. Insert the following lines at line 43 in setup_task
(just after loading a FairseqDataclass
cfg):
https://github.com/pytorch/fairseq/blob/89371294e54ef8c306f19733f2e8bab8233c401e/fairseq/tasks/__init__.py#L42-L44
import pickle
pickle.dumps(cfg)
Then execute the command from the hydra tutorial:
$ fairseq-hydra-train \
distributed_training.distributed_world_size=1 \
dataset.batch_size=2 \
task.data=data-bin \
model=transformer_lm/transformer_lm_gpt \
task=language_modeling \
optimization.max_update=5000
You will receive the following error:
Traceback (most recent call last):
File "./fairseq/fairseq_cli/hydra_train.py", line 45, in hydra_main
distributed_utils.call_main(cfg, pre_main)
File "./fairseq/fairseq/distributed/utils.py", line 369, in call_main
main(cfg, **kwargs)
File "./fairseq/fairseq_cli/train.py", line 82, in main
task = tasks.setup_task(cfg.task)
File "./fairseq/fairseq/tasks/__init__.py", line 45, in setup_task
pickle.dumps(cfg)
_pickle.PicklingError: Can't pickle <enum 'Choices'>: attribute lookup Choices on fairseq.dataclass.constants failed
For anyone that stumbles over the same problem, here is a quick solution until this is fixed:
cfg = OmegaConf.merge(
OmegaConf.structured(
MyConfigClassWhichNeedsToBePickled
),
OmegaConf.create(
OmegaConf.to_yaml(cfg, resolve=True)
)
)
This creates an new omegaconf.DictConfig
object, which can be pickled.
Hi @villmow,
Can you please let me know where this code needs to be added and in which file?
Is it in util.py line 460?
Confirmed, To fix this -> Add the code mentioned by @villmow in line 460 of utils.py
Thanks!
🐛 Bug
I'm doing some custom preprocessing in the dataset using multiprocessing. I recently switched to
hydra
, which works quite nice! I'm working withFairseqDataclass
and structured Configs, which is really nice. In my dataset I pass thecfg
object to a worker function. Without multiprocessing it works like a charm, but when I switch to multiprocessing it crashes with some error I don't understand.To Reproduce
I don't have exact steps to reproduce, but maybe you guys have an idea:EDIT: See my comment for steps to reproduce.
I don't know why an enum Choices needs to be pickled, I don't import this anywhere. Do you have an idea?
I'm pretty confident, that the config object is the cause of problems. When I don't provide the config object to the worker, it does not crash. When I don't do multiprocessing it does not crash.
Expected behavior
It shouldn't be a problem to pickle config objects.
Environment
pip
, source): source