Open HCookie opened 2 weeks ago
What does static mean? Constant at, say 2? This is already supported. What does dynamic selection mean?
@mchantry Updated the description
I like the idea of dynamic selection of increments and I was also wondering if this could be done by steps as well as by epochs? For example at step 1000, do roll 2, at step 10000, do roll 10. Also I think this would avoid the issue of if you wanted to do rollout within epochs as you could then define it by steps instead
I agree with @mc4117. Some models show a better performance when trained for longer on 2-steps and only some iterations on longer rollout steps. | I wonder, however, if that could not be solved by limiting the number of batches per epoch and provide a list of rollout lengths, e.g. [2,2,2,2,2,2,2,2,3,4,5,6,...].
I like @mc4117 suggestion regarding supporting rollout by steps. I think this probably would make things easier if, in the future, we want to automate the training so that the 6-hour and the rollout steps are executed one after the other.
Moving to a discussion (to try it out) https://github.com/ecmwf/anemoi-training/discussions/148
Our current rollout implementation is very focused on sequential epoch increments, it would be good to generalise this to provide schedulers to control rollout.
Work was done in
aifs-mono
to enable this. here I think this can be generalised and provide more general applicability.Features
Below is a list of features and requirements as I see them
Improvements
Setup config at begin of training with rollout increment be
Questions
What other features may be needed?