facebookresearch / metaseq

Repo for external large-scale work
MIT License
6.45k stars 723 forks source link

Figure out potential duplication in ConfigStore #7

Open suchenzang opened 2 years ago

suchenzang commented 2 years ago

By the time we get to our first convert_namespace_to_omegaconf call, we already have:

# from hydra.core.config_store import ConfigStore
(Pdb) cs = ConfigStore.instance()
(Pdb) cs.repo.keys()
dict_keys(['hydra', '_dummy_empty_config_.yaml', 'base_config.yaml', '_name.yaml', 'common.yaml', 'common_eval.yaml', 'distributed_training.yaml', 'dataset.yaml', 'optimization.yaml', 'checkpoint.yaml', 'generation.yaml', 'eval_lm.yaml', 'model.yaml', 'task.yaml', 'criterion.yaml', 'optimizer.yaml', 'lr_scheduler.yaml', 'bpe.yaml', 'tokenizer.yaml', 'model', 'optimizer', 'lr_scheduler', 'bpe', 'task'])

which seems to have some redundancy in yaml vs non-yaml keys. Figure out if we can remove the *.yaml keys somehow.

suchenzang commented 2 years ago

Some more notes from pdb-ing :

# before hydra_init():

(Pdb) cs.repo.keys()
dict_keys(['hydra', '_dummy_empty_config_.yaml'])

# after hydra_init():

(Pdb) cs.repo.keys()
dict_keys(['hydra', '_dummy_empty_config_.yaml', 'base_config.yaml', '_name.yaml', 'common.yaml', 'common_eval.yaml', 'distributed_training.yaml', 'dataset.yaml', 'optimization.yaml', 'checkpoint.yaml', 'generation.yaml', 'eval_lm.yaml', 'model.yaml', 'task.yaml', 'criterion.yaml', 'optimizer.yaml', 'lr_scheduler.yaml', 'bpe.yaml', 'tokenizer.yaml'])

# after import fairseq.models

(Pdb) cs.repo.keys()
dict_keys(['hydra', '_dummy_empty_config_.yaml', 'base_config.yaml', '_name.yaml', 'common.yaml', 'common_eval.yaml', 'distributed_training.yaml', 'dataset.yaml', 'optimization.yaml', 'checkpoint.yaml', 'generation.yaml', 'eval_lm.yaml', 'model.yaml', 'task.yaml', 'criterion.yaml', 'optimizer.yaml', 'lr_scheduler.yaml', 'bpe.yaml', 'tokenizer.yaml', 'model'])

(Pdb) cs.repo['model'].keys()
dict_keys(['transformer_lm.yaml'])