Open colinclement opened 2 years ago
Had the same issue and updated OmegaConf to version 2.1.0 following the discussion at https://github.com/omry/omegaconf/discussions/746. They fixed this inconsistency in that version and I tested it and everything worked fine for BART pre-training. It's a minor update if they follow semantic versioning, and in theory it shouldn't be a problem to update fairseq and hydra.
š Bug
The version of
hydra-core
which is prescribed in thefairseq/setup.py
includes a bug from OmegaConf which breaks legacy model support infairseq-hydra-train
. The most up-to-date version ofhydra-core
allowed currently is1.0.7
, which has the extremely strange behavior that for aDictConfig
object (which is whatfairseq-hydra-train
casts theargs
of the command into) overrides the behavior of__getattr__
so thatgetattr(dict_config_object, "key_not_present_in_dict_config_object", default_value)
always returnsNone
instead ofdefault_value
. This breaks the mechanism for modifying architectures through theregister_model_architecture
system.Proposed fix
Allow new versions of
hydra-core
andomegaconf
insetup.py
. I upgrade tohydra-core==1.2.0
andomegaconf==2.2.3
and the bug was successfully resolved.To Reproduce
For example, in the
bart_large
architecture, none of the hyperparameter changes are propagated, so that even if you specifyarch=bart_large
, the model actually created is justbart_base
.config.yaml
file in your current working directory:criterion: cross_entropy
dataset: batch_size: 16 ignore_unused_valid_subsets: true
optimizer: _name: adam weight_decay: 0.01 adam_betas: (0.9,0.98) adam_eps: 1e-06
lr_scheduler: _name: inverse_sqrt warmup_updates: 100000
optimization: clip_norm: 0.1 lr: [1e-5] max_update: 5000000 update_freq: [4]
model: arch: bart_large