Open JulianBiesheuvel opened 1 month ago
You mean this figure?
Maybe a solution could be to plot the distribution of time ranges in training and validation in each fold (e.g. as box plot)? And a count of which months covered by the data? A measurement with to/from date 10 Oct to 20 May would roughly cover Oct, Nov, Dec, Jan, Feb, Mar, Apr, May, and count towards these months. A measurement of annual mb would count towards all months.
The first plot would illustrate how aggregated the data is while the second would illustrate how well winter vs. summer seasons are represented.
Previously, when we used fixed date ranges for the summer and winter periods, it was easy to visualise with a bar diagram how many samples of each season and annually were available in each fold. Now, with variable date ranges this is not possible anymore. Another method should be developed to show this to the user.