AtMoDat / atmodat_data_checker

This is a python library that contains checks to ensure compliance with the AtMoDat Standard.
https://www.atmodat.de/
Apache License 2.0
7 stars 2 forks source link

Check mandatory ATMODAT requirements w.r.t. the description of Model’s Axes #114

Open atmodatcode opened 2 years ago

atmodatcode commented 2 years ago

The ATMODAT Standard v3.0 Section 4.3. Specifications for File Formats and Standards states "The ATMODAT standard requires that: [...] • NetCDF file headers include description of time, coordinate and vertical axes according to Appendix E. [...] "

How do we address this requirement?

I suggest that we add a note in the checker results that we are not checking for this requirement and that users should check this requirement themselves. What do you think?

jkretz commented 2 years ago

Well should be covered by the CF-Checker as far as I know

atmodatcode commented 2 years ago

not necessarily. See Appendix E ATMODAT Standard. "E. Description of Model’s Axes Providing horizontal, vertical and temporal axes is optional in the CF Conventions. The conventions just prescribe how these axes have to be described when they are provided. However, having spatial and temporal information is often required for a proper reuse of atmospheric model data. Therefore, this standard requires this information under specific conditions: • If data are horizontally resolved (e.g. lon + lat or x + y), then horizontal coordinate axes should be provided. • If data have reasonable vertical information (e.g. pressure or height), then the vertical axis should be described. • If data are not static in time (e.g. via dimension and variable time), then the time axis should be provided."

So, if you have timeseries netcdf files where each timestep is stored in a separate file, then users might omit adding a time dimension to the data variables. So maybe they just put var1(lat,lon) and not var1(time,lat,lon) which - according to the ATMODAT Standard - they should. The CF-Checker won't complain about var1(lat,lon) ... I understand that this is hard to capture with a checker, but we could put a short info message at the bottom of short summary to make users aware:

e.g. Short summary bla..bla Please note: • If data are horizontally resolved (e.g. lon + lat or x + y), then horizontal coordinate axes should be provided. • If data have reasonable vertical information (e.g. pressure or height), then the vertical axis should be described. • If data are not static in time (e.g. via dimension and variable time), then the time axis should be provided.

jkretz commented 2 years ago

Well var1(lat,lon) is static in time, therefore there is no need to check anything. And if a timeseries is stored in separate files, I wouldn't call that timeseries which makes it static again. This is definitely an edge case that we cannot cover and furthermore, the nomenclature is "should" so I would avoid outputting a warning which we will have to display each and every time as we don't know what potential user data will look like. Let's focus on things we can improve for now and leave that for later.

atmodatcode commented 2 years ago

I don't agree with your "And if a timeseries is stored in separate files, I wouldn't call that timeseries which makes it static again.". If the nature of a variable is time-dependent (e.g. precipitation model output at a given time step), then it is not static...even if the output file only contains a single timestep. It is essential that users add time as a dimension because then, time information can be read in a machine-actionable manner because it is properly described via the time coordinate variable. And this allows users to merge the individual timesteps with e.g. cdo mergetime

Look at the header of a time series dataset where individual timesteps are saved in individual netcdf files... Actually, this is quite common, especially when multidimensional data are stored and when storing of more than one timestep in a single file makes file sizes hard to handle.

dimensions: time = UNLIMITED ; // (1 currently) lon = 384 ; lat = 192 ; variables: double time(time) ; time:standard_name = "time" ; time:long_name = "time" ; time:units = "days since 1850-01-01 00:00:00" ; time:calendar = "proleptic_gregorian" ; time:axis = "T" ; float rldscs(time, lat, lon) ; .... data: time = 0.0625 ;

static files are e.g. land-sea masks which are considered static (temporal changes over the model period are considered neglectable and the time dimension is therefore considered irrelevant).

I think we need to discuss this issue with the AtMoDat team.

.....

jkretz commented 2 years ago

Sounds good to dicuss that later.