facebookresearch / fairseq

Facebook AI Research Sequence-to-Sequence Toolkit written in Python.
MIT License
29.75k stars 6.3k forks source link

Add `--validate-after-epochs` training flag #5496

Open mcognetta opened 1 month ago

mcognetta commented 1 month ago

🚀 Feature Request

Add a --validate-after-epochs training flag that is a companion flag to --validate-after-updates.

Note: I already have a PR for this ready that I can contribute if this gets approved.

Motivation

When your task is configured to run validation after each epoch, --validate-after-updates can be difficult to use, since you might not know how many updates are in an epoch. This would add a companion flag that allows you to delay validation until N epochs have passed, without having to know in advance how many batches are included in a single epoch.

There is already precedent to have parallel flags for epoch-based and update-based validation (e.g., --validate-interval vs --validate-interval-updates), so it seems like this wouldn't be an unusual addition.

Pitch

Add a --validate-after-epochs flag to configs.py

https://github.com/facebookresearch/fairseq/blob/bedb259bf34a9fc22073c13a1cee23192fa70ef3/fairseq/dataclass/configs.py#L521-L529

and to fairseq_cli/train.py

https://github.com/facebookresearch/fairseq/blob/bedb259bf34a9fc22073c13a1cee23192fa70ef3/fairseq_cli/train.py#L418-L430

Alternatives

The work around to this is to just do so estimation on how many batches are in an epoch, or to start a task, let it run for one update so that you can see batches-per-epoch, then start it over with the correct value set.

Additional context

I already have a PR prepared for this (it's like a 5 line change), but my understanding is that things like this need to be approved via issues first.