ncISO only produces "boundingTemporalExtent" when time coordinate variable name is "time"

Unidata / threddsIso

A THREDDS Data Server extension which generates NCML, a metadata rubric, and ISO 19115.

7 stars 17 forks source link

ncISO only produces "boundingTemporalExtent" when time coordinate variable name is "time" #9

Closed rsignell-usgs closed 9 years ago

rsignell-usgs commented 9 years ago

I spent quite a few hours trying to figure out why this dataset doesn't have a "boundingTemporalExtent" in the ISO metadata:

http://comt.sura.org/thredds/comt_1_archive_summary.html?dataset=estuarine_hypoxia.WHOI_ChesROMS-1term.1991-2005_1termDO

The time values load just fine in Matlab, Python and in the TOOLS-UI gui. So why doesn't the time extent show up in the ISO metadata here? http://comt.sura.org/thredds/iso/data/comt_1_archive/estuarine_hypoxia/WHOI_ChesROMS-1term/1991-2005_1termDO?catalog=http%3A%2F%2Fcomt.sura.org%2Fthredds%2Fcomt_1_archive_summary.html&dataset=estuarine_hypoxia.WHOI_ChesROMS-1term.1991-2005_1termDO

From the discussion on the ncisometadata google group with @geoneubie it turns out that ncISO is checking to see the coordinate variable time is named time, and since in this dataset the coordinate variable happens to be named "ocean_time", no "boundingTemporalExtent" is calculated. The offending line of code is:

https://github.com/Unidata/threddsIso/blob/498f859b8650533053fb864f42a81442bac136c1/src/main/java/thredds/server/metadata/util/ThreddsExtentUtil.java#L170

The check is being made to distinguish the "time" coordinate from the "forecast_reference_time" or other time coordinates. But this should be determined by standard_name=time, not from the name of the netcdf variable, right?

@geoneubie comments? @kwilcox, this problem currently affects all ROMS aggregations. :crying_cat_face:

rsignell-usgs commented 9 years ago

After a discussion with Anna Milan (@amilan17), I'd like to propose an improved solution.

The existing logic creates time extent information in the metadata only if the time coordinate variable is named "time". In CF, variable names are arbitrary -- the conventions are specified via attributes. So really we should be testing to see if the time coordinate variable has a "standard_name" attribute with value "time".

But since many folks likely do not specify a "standard_name" attribute for their time variable, we could use this logic:

check time coordinate variable for the attribute "standard_name" = "time". If so, use this variable to compute the time extents in the metadata.
If there is no "standard_name" = "time" attribute, see if the netcdf variable name is "time", and if so, use it.

This would allow folks to call their time coordinate variable whatever they want (e.g. "ocean_time) as long as they provide the standard_name = "time" attribute. And it wouldn't break existing workflows.

While both the existing and proposed logic would allow a someone to save forecast reference times in a variable called "time" and get the wrong time extents, it seems better to allow this unlikely edge case rather than impose a "tough love" solution of requiring everyone to have standard_name="time".

@lesserwhirls , @dopplershift , can you guys implement, or do we need to bring in @geoneubie ?

rsignell-usgs commented 9 years ago

@shane-axiom has a fix, hopefully he will get a chance to submit the PR! https://github.com/shane-axiom/threddsIso/commit/b56a653f2515f43ce3fe50578eb6f3cf68939978

Here's proof that it's working on our development server: http://geoport-dev.whoi.edu/thredds/iso/usgs/data0/bbleh/spring2012/00_dir_roms.ncml?catalog=http%3A%2F%2Fgeoport-dev.whoi.edu%2Fthredds%2Fcatalog%2Fusgs%2Fdata0%2Fbbleh%2Fspring2012%2Fcatalog.html&dataset=usgs%2Fdata0%2Fbbleh%2Fspring2012%2F00_dir_roms.ncml

geoneubie commented 9 years ago

Rich, Shane, and others,

Thanks for jumping into the code to address some of these issues! The main historical issue that should be considered in any potential fix is related to multiple temporal extents. I believe the existing implementation was a way of dealing with a netcdf dataset with the following scenario:

1 variable called 'time' with standard_name 'time' - representing observed time multiple variables with standard_name 'time' - representing multiple future forecast times

To deal with this scenario we went with a strict rule set to pick the 80% case where we would capture observed time in the ISO metadata record. I'd like to know in any proposed solution what the tradeoffs will be for the above use case.

@shane-axiom

-Dave

rsignell-usgs commented 9 years ago

@geoneubie, if you look at a Forecast Model Run Collection best time series, for example: http://geoport.whoi.edu/thredds/dodsC/coawst_4/use/fmrc/coawst_4_use_best.ncd.html you can see there are two time coordinate variables: time and time_run.

The coordinate variable time has standard_name='time'. This is the one we want want to compute the extents from because it has standard_name='time'.

The coordinate variable time_run has standard_name='forecast_reference_time', with long_name='run times for coordinate = 'time'. We don't want this.

The @shane-axiom code here: https://github.com/shane-axiom/threddsIso/blob/master/src/main/java/thredds/server/metadata/util/ThreddsExtentUtil.java#L154-L168

tests first to see if there is a var with standard_name='time' and if that fails, it takes the variable with netcdf variable name time (if that exists).

By testing for standard_name='time', you can call your time coordinate whatever you want, per CF specifications. And this then allows ROMS aggregations to have time extents calculated, which are about 80% of our ocean model simulations. :grin:

Make sense?