Add a --validate-after-epochs training flag that is a companion flag to --validate-after-updates.
Note: I already have a PR for this ready that I can contribute if this gets approved.
Motivation
When your task is configured to run validation after each epoch, --validate-after-updates can be difficult to use, since you might not know how many updates are in an epoch. This would add a companion flag that allows you to delay validation until N epochs have passed, without having to know in advance how many batches are included in a single epoch.
There is already precedent to have parallel flags for epoch-based and update-based validation (e.g., --validate-interval vs --validate-interval-updates), so it seems like this wouldn't be an unusual addition.
The work around to this is to just do so estimation on how many batches are in an epoch, or to start a task, let it run for one update so that you can see batches-per-epoch, then start it over with the correct value set.
Additional context
I already have a PR prepared for this (it's like a 5 line change), but my understanding is that things like this need to be approved via issues first.
🚀 Feature Request
Add a
--validate-after-epochs
training flag that is a companion flag to--validate-after-updates
.Note: I already have a PR for this ready that I can contribute if this gets approved.
Motivation
When your task is configured to run validation after each epoch,
--validate-after-updates
can be difficult to use, since you might not know how many updates are in an epoch. This would add a companion flag that allows you to delay validation until N epochs have passed, without having to know in advance how many batches are included in a single epoch.There is already precedent to have parallel flags for epoch-based and update-based validation (e.g.,
--validate-interval
vs--validate-interval-updates
), so it seems like this wouldn't be an unusual addition.Pitch
Add a
--validate-after-epochs
flag toconfigs.py
https://github.com/facebookresearch/fairseq/blob/bedb259bf34a9fc22073c13a1cee23192fa70ef3/fairseq/dataclass/configs.py#L521-L529
and to
fairseq_cli/train.py
https://github.com/facebookresearch/fairseq/blob/bedb259bf34a9fc22073c13a1cee23192fa70ef3/fairseq_cli/train.py#L418-L430
Alternatives
The work around to this is to just do so estimation on how many batches are in an epoch, or to start a task, let it run for one update so that you can see batches-per-epoch, then start it over with the correct value set.
Additional context
I already have a PR prepared for this (it's like a 5 line change), but my understanding is that things like this need to be approved via issues first.