facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
30.22k stars 6.38k forks source link

`fairseq-hydra-train` is broken for legacy model architectures #4675

Open colinclement opened 2 years ago

colinclement commented 2 years ago

šŸ› Bug

The version of hydra-core which is prescribed in the fairseq/setup.py includes a bug from OmegaConf which breaks legacy model support in fairseq-hydra-train. The most up-to-date version of hydra-core allowed currently is 1.0.7, which has the extremely strange behavior that for a DictConfig object (which is what fairseq-hydra-train casts the args of the command into) overrides the behavior of __getattr__ so that getattr(dict_config_object, "key_not_present_in_dict_config_object", default_value) always returns None instead of default_value. This breaks the mechanism for modifying architectures through the register_model_architecture system.

Proposed fix

Allow new versions of hydra-core and omegaconf in setup.py. I upgrade to hydra-core==1.2.0 and omegaconf==2.2.3 and the bug was successfully resolved.

To Reproduce

For example, in the bart_large architecture, none of the hyperparameter changes are propagated, so that even if you specify arch=bart_large, the model actually created is just bart_base.

  1. Using this config.yaml file in your current working directory:
    
    task:
    _name: span_masked_lm
    data: /path/to/data/binary

criterion: cross_entropy

dataset: batch_size: 16 ignore_unused_valid_subsets: true

optimizer: _name: adam weight_decay: 0.01 adam_betas: (0.9,0.98) adam_eps: 1e-06

lr_scheduler: _name: inverse_sqrt warmup_updates: 100000

optimization: clip_norm: 0.1 lr: [1e-5] max_update: 5000000 update_freq: [4]

model: arch: bart_large



2. execute `fairseq-hydra-train -m --config-dir ./ --config-name config` will successfully launch and train, but will train only a 6-layer bart.base model because none of the architecture changes were allowed to propagate through the `DictConfig` object.

### Expected behavior
The above example should train a 12-layer `bart_large` model as `arch: bart_large` is set.

### Environment

 - fairseq Version (e.g., 1.0 or main): d81fac8163364561fd6cd9d82b6ee1ba502c3526 (Aug 29 2022) 
 - PyTorch Version (e.g., 1.0): 1.12.1
 - OS (e.g., Linux): Ubuntu 18.04
 - How you installed fairseq (`pip`, source): source
 - Build command you used (if compiling from source): `python -m pip install -e . --upgrade`
 - Python version: 3.8
 - CUDA/cuDNN version: 11.4
 - GPU models and configuration: 4x 16GB Tesla V100
 - Any other relevant information:
erichans commented 1 year ago

Had the same issue and updated OmegaConf to version 2.1.0 following the discussion at https://github.com/omry/omegaconf/discussions/746. They fixed this inconsistency in that version and I tested it and everything worked fine for BART pre-training. It's a minor update if they follow semantic versioning, and in theory it shouldn't be a problem to update fairseq and hydra.