facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.22k stars 6.38k forks source link

FairseqDataclass cfg object can't be pickled (hydra) #3482

Open villmow opened 3 years ago

villmow commented 3 years ago

🐛 Bug

I'm doing some custom preprocessing in the dataset using multiprocessing. I recently switched to hydra, which works quite nice! I'm working with FairseqDataclass and structured Configs, which is really nice. In my dataset I pass the cfg object to a worker function. Without multiprocessing it works like a charm, but when I switch to multiprocessing it crashes with some error I don't understand.

To Reproduce

I don't have exact steps to reproduce, but maybe you guys have an idea:

EDIT: See my comment for steps to reproduce.

Traceback (most recent call last):
 [...]
  File "env/lib/python3.8/pickle.py", line 558, in save
    f(self, obj)  # Call unbound method with explicit self
  File "env/lib/python3.8/site-packages/dill/_dill.py", line 1390, in save_type
    StockPickler.save_global(pickler, obj, name=name)
  File "env/lib/python3.8/pickle.py", line 1068, in save_global
    raise PicklingError(
_pickle.PicklingError: Can't pickle <enum 'Choices'>: it's not found as fairseq.dataclass.constants.Choices

I don't know why an enum Choices needs to be pickled, I don't import this anywhere. Do you have an idea?

I'm pretty confident, that the config object is the cause of problems. When I don't provide the config object to the worker, it does not crash. When I don't do multiprocessing it does not crash.

Expected behavior

It shouldn't be a problem to pickle config objects.

Environment

villmow commented 3 years ago

Here is a complete example to reproduce. Insert the following lines at line 43 in setup_task (just after loading a FairseqDataclass cfg): https://github.com/pytorch/fairseq/blob/89371294e54ef8c306f19733f2e8bab8233c401e/fairseq/tasks/__init__.py#L42-L44

import pickle
pickle.dumps(cfg)

Then execute the command from the hydra tutorial:

$ fairseq-hydra-train \
    distributed_training.distributed_world_size=1 \
    dataset.batch_size=2 \
    task.data=data-bin \
    model=transformer_lm/transformer_lm_gpt \
    task=language_modeling \
    optimization.max_update=5000

You will receive the following error:

Traceback (most recent call last):
  File "./fairseq/fairseq_cli/hydra_train.py", line 45, in hydra_main
    distributed_utils.call_main(cfg, pre_main)
  File "./fairseq/fairseq/distributed/utils.py", line 369, in call_main
    main(cfg, **kwargs)
  File "./fairseq/fairseq_cli/train.py", line 82, in main
    task = tasks.setup_task(cfg.task)
  File "./fairseq/fairseq/tasks/__init__.py", line 45, in setup_task
    pickle.dumps(cfg)
_pickle.PicklingError: Can't pickle <enum 'Choices'>: attribute lookup Choices on fairseq.dataclass.constants failed
villmow commented 3 years ago

For anyone that stumbles over the same problem, here is a quick solution until this is fixed:

cfg = OmegaConf.merge(
    OmegaConf.structured(
         MyConfigClassWhichNeedsToBePickled
    ),
    OmegaConf.create(
        OmegaConf.to_yaml(cfg, resolve=True)
    )
)

This creates an new omegaconf.DictConfig object, which can be pickled.

harveenchadha commented 3 years ago

Hi @villmow,

Can you please let me know where this code needs to be added and in which file?

Is it in util.py line 460?

Confirmed, To fix this -> Add the code mentioned by @villmow in line 460 of utils.py

Thanks!