aodn / compliance-checker

Python tool to check your datasets vs compliance standards. Forked to include AODN specific modifications.
Apache License 2.0
1 stars 0 forks source link

IMOS: check_time_variable - TIME unit tested the simple way #35

Closed lbesnard closed 9 years ago

lbesnard commented 9 years ago

file tested: http://thredds.aodn.org.au/thredds/catalog/IMOS/SOOP/SOOP-ASF/VNAA_Aurora-Australis/flux_product/2013/catalog.html?dataset=IMOS/SOOP/SOOP-ASF/VNAA_Aurora-Australis/flux_product/2013/IMOS_SOOP-ASF_FMT_20130315T000200Z_VNAA_FV02.nc

IMOS test code : https://github.com/aodn/compliance-checker/blob/IMOS-Checker/compliance_checker/imos/imos.py#L706-L712

result = check_value(('TIME','units',),
                                    'days since 1950-01-01 00:00:00 UTC',
                                    IMOSCheck.OPERATOR_EQUAL,
                                    dataset,
                                    IMOSCheck.CHECK_VARIABLE_ATTRIBUTE,
                                    result_name,
                                    BaseCheck.HIGH)

The IMOS checker looks for a time string specifically equal to 'days since 1950-01-01 00:00:00 UTC'

1) is this really an IMOS test ? I would have imagined this was CF. I think this test should be removed as its a duplicate unless there is another reason why see https://github.com/aodn/compliance-checker/blob/IMOS-Checker/compliance_checker/cf/cf.py#L1162 check_time_coordinate function 2) if its not a duplicate, then the test is anyway wrong, because it should be very close if not similar to the CF convention, otherwise it would lead to some misunderstanding

The time coordinate is defined here : http://cfconventions.org/Data/cf-conventions/cf-conventions-1.6/build/cf-conventions.html#time-coordinate

The acceptable units for time are listed in the udunits.dat file. The most commonly used of these strings (and their abbreviations) includes day (d) , hour (hr, h) , minute (min) and second (sec, s) . Plural forms are also acceptable. The reference time string (appearing after the identifier since ) may include date alone; date and time; or date, time, and time zone. The reference time is required. A reference time in year 0 has a special meaning (see Section 7.4, “Climatological Statistics” ).

Note: if the time zone is omitted the default is UTC, and if both time and time zone are omitted the default is 00:00:00 UTC.

We recommend that the unit year be used with caution. The Udunits package defines a year to be exactly 365.242198781 days (the interval between 2 successive passages of the sun through vernal equinox). It is not a calendar year. Udunits includes the following definitions for years: a common_year is 365 days, a leap_year is 366 days, a Julian_year is 365.25 days, and a Gregorian_year is 365.2425 days.

For similar reasons the unit month , which is defined in udunits.dat to be exactly year/12 , should also be used with caution.

so it's a bit more complex that what's is in place, for example

something equivalent to this https://github.com/aodn/data-services/blob/master/lib/MATLAB/netcdf/getTimeDataNC.m#L46 could be used (however this example doesn't include all valid possibilities)

mhidas commented 9 years ago

@lbesnard The CF checker checks that the unit is a valid time specification in the form "time_unit since reference_time", so you're right, we don't need to replicate that.

However, according to version 1.3 of the IMOS netCDF manual:

eMII suggests that all the IMOS data should use the ARGO reference time of 1st of January of 1950. The value will be stored as the number of days since this reference time.

So we're specifying what the time_unit and reference_time should be, though this is not mandatory.

Two things need to be fixed in the code: 1) Separately check that unit and ref time are equivalent to what we want, rather than matching the exact string. 2) Since this is not mandatory, the priority level of this check should be MEDIUM (or even LOW?), not HIGH.

fxmzb123 commented 9 years ago

@mhidas Can you provide me all valid time_units? It seems currently , only 'days' is valid time_unit

fxmzb123 commented 9 years ago

@mhidas can you advice me the correct date time format for this attribute, Netcdf files have both format 'days since 1950-01-01 00:00:00 UTC' and "days since 1950-01-01 00:00:00Z'. Should I test both case?

lbesnard commented 9 years ago

@ggalibert Could you please advice Ming with what is the IMOS format. I'm still surprised to have this check in the IMOS test

ggalibert commented 9 years ago

Like Marty said, from eMII perspective the time_unit needs to be 'days' and the reference_time needs to be '1950-01-01 00:00:00 UTC'.

ggalibert commented 9 years ago

The IMOS checker looks for a time string specifically equal to 'days since 1950-01-01 00:00:00 UTC'

1- Yes, this is an IMOS NetCDF Conventions recommendation. 2- CF allows other time_units and reference_time so this IMOS test is more specific and relevant.

Separately check that unit and ref time are equivalent to what we want, rather than matching the exact string.

@mhidas what do you suggest as equivalent entries?

mhidas commented 9 years ago

@ggalibert I was thinking of allowing variants like this

Technically these are all the same unit according to UDUNITS (same time unit and reference time, UTC is assumed if timezone not specified). This equivalence check could be done fairly easily using the cf_units module, but would also allow something as simple as "d since 1950-01-01", which I don't think is acceptable.

So it would probably be simpler to stick to our guns and require the exact string. This just means we'll have more files to fix, and we'll have to do this carefully, because there are many files which actually use numerically different units for time. Here are some examples I found in IMOS files:

    TIME:units = "days since 1970-01-01T00:00:00 UTC" ;
    TIME:units = "days since 1985-01-01 00:00:00+00:00" ;
    TIME:units = "days since 2000-01-01 00:00:00Z" ;
    TIME:units = "Seconds since 1970-01-01 00:00:00.0Z" ;
    TIME:units = "seconds since 1970-01-01 00:00:00 UTC" ;
ggalibert commented 9 years ago

Made a PR to at least downgrade the priority of check on units attribute content to MEDIUM : https://github.com/aodn/compliance-checker/pull/68

Found the idea of using cf_units module to check for equivalent great though.

mhidas commented 9 years ago

Ok, with #68 merged, let's consider this fixed for now. If we have too much trouble with files using different time units, we can revisit later.

lbesnard commented 9 years ago

If someone could give me a face to face update on this matter next week. I'm still really puzzled and I don't get why this test should be in the IMOS checker. IMO this should disappear from the IMOS convention too as it's already well defined in CF, and more restricting than helpful in the IMOS documentation.

The CF gives the freedom to choose any date, as long as a few things are respected. That's good enough to me

For example, I have some data from cmar which I get from their geoserver. Some of the data is dated 1958. As we're now creating NetCDF files on the fly with the portal, it won't make anysense to have a B.I. (Before IMOS) date in the NetCDF... and that's with imagining it would even work !

ggalibert commented 9 years ago

I've had a talk with @lbesnard and considering the scope of this issue I think we agree we can close it. If there is anything we would like to change, then we would first need to update the IMOS convention and create a new issue.