VeruGHub / easyclimate

Easy access to high-resolution daily climate data for Europe
https://verughub.github.io/easyclimate/
GNU General Public License v3.0
45 stars 1 forks source link

data update 2023 #55

Open VeruGHub opened 2 months ago

VeruGHub commented 2 months ago

Two options (@cpucher):

VeruGHub commented 2 months ago

The change from v3 to v4 was because E-Obs data was updated with a different spatial resolution, right? I don't have a strong opinion on what is better now, but I find important to keep all versions stored and accessible to the users, so maybe we need to be conservative and not create new versions very often. An option could be to inform in the documentation that slightly changes can happen in the data with yearly updates due to E-Obs updates. This made me think that maybe we would need to define a reference period to calculate monthly and yearly values so values do not change every year. What do you think?

cpucher commented 2 months ago

As we changed the resolution to 500 m it was clear that we will need to re-calculate the whole time series again. However, in previous iterations (v2 and v3) we also always updated the whole time series, although the resolution didn't change.

These are the changes (apart from continuing previous time series) in the two E-Obs version released since our last calculation: v28.0e: New series are included for Campania and Trentino in Italy and the elevation is corrected for German precipitation stations. v29.0e: Included new stations and updates for Ukraine, Portugal and Belgium Included data from Global Summary of the Day for southeast Europe Updated Polish precipitation series that were wrongly included. Included radiation series for Trentino in Italy.

They may warrant a re-calculation of the whole time series. There is also a reason why E-Obs always releases a new version instead of just "updating" the old one I guess. We could also decide on some update policy, e.g. a new version only every 3 years and inbetween just updating the current version.

I find important to keep all versions stored and accessible to the users I don't agree that we need to keep all versions stored and accessible to the users, having 2-3 versions available should be enough. If it comes to reproduceability, the users have to store the data they have used for their analysis and it shouldn't be dependent on us still providing for instance v1 of the data.

This made me think that maybe we would need to define a reference period to calculate monthly and yearly values so values do not change every year. What do you think? This comment is not clear for me : -)

Pakillo commented 2 months ago

Hi,

I agree from an ideal point of view we should store all data versions for the sake of reproducibility. Last time we talked about this we discarded the idea for lack of resources ($$). But it would be nice to secure some online hosting to save all data versions.

Alternatively, we could publish the source code that takes the E-Obs dataset and produces the rasters that are then hosted in the FTP server and served through easyclimate. Archiving the source code is trivial and free (e.g. in Zenodo), and would permit anyone to reproduce the rasters in case they needed to. We would just need to specify which version of the E-Obs dataset was used in each of our data versions.

That would free us from having to store all former data versions, and serve only the most recent and updated rasters (perhaps storing the penultimate version too just in case). It looks like users will often request the latest year to be added soon, and IMO it looks better to serve the most correct, updated version whenever possible, rather than waiting 2-3 years between releases.

So, I think we could publish the source code and update the dataset yearly, but storing only the latest and penultimate version in the server. Does that sound like a good option to you?

VeruGHub commented 11 hours ago

I would like to relaunch this discussion! We get lost in details I think. I propose:

What do you think?

And in relation to this: "This made me think that maybe we would need to define a reference period to calculate monthly and yearly values so values do not change every year." What I wanted to bring here is wether average monthly/annual values need to be updated every year with the E-Obs release or we define a reference period (e.g. 1980-2010) to calculate de averages and keep them more fixed

Pakillo commented 9 hours ago

Sounds good to me!

When you say "only create a new version if there are substantial changes", if we "update the whole time series every year", that means we will have one new version every year, right?

So according to this plan the server would store current and last year versions..

we define a reference period (e.g. 1980-2010) to calculate de averages

I understand you want to include climatological averages besides monthly and yearly rasters. I'm fine with that, but if we update the whole series every year, the averages should be updated too, otherwise the data would be incoherent. But I'm fine with setting a reference period (maybe 1990-2020 would be more useful). This average would have to be recalculated every year with the yearly E-OBS update.