Observational datasets at high frequencies (daily or hourly) are computationally expensive to work with. Not all diagnostics make use of this high time resolution and the necessary preprocessing in these diagnostics is time (and energy) consuming and often leads to memory issues (e.g. #51). Given that the resampled datasets should be an order of magnitude smaller than their high frequency origins, I think it would be worth to include these in the dataset pool. I think we should by default provide hourly data also as daily means and monthly means and daily data should also be provided as monthly means. Some considerations:
a) Should we include this in the CMORization scripts?
b) How do we distinguish between the different frequencies? It should be reflected in the file naming convention and be possible to specify which frequency to pick from the recipe.
Problem description:
Observational datasets at high frequencies (daily or hourly) are computationally expensive to work with. Not all diagnostics make use of this high time resolution and the necessary preprocessing in these diagnostics is time (and energy) consuming and often leads to memory issues (e.g. #51). Given that the resampled datasets should be an order of magnitude smaller than their high frequency origins, I think it would be worth to include these in the dataset pool. I think we should by default provide hourly data also as daily means and monthly means and daily data should also be provided as monthly means. Some considerations:
a) Should we include this in the CMORization scripts?
b) How do we distinguish between the different frequencies? It should be reflected in the file naming convention and be possible to specify which frequency to pick from the recipe.