aodn / imos-toolbox

Graphical tool for QC'ing and NetCDF'ing oceanographic datasets
GNU General Public License v3.0
46 stars 31 forks source link

filename date start and end should match time_deployment_start/end attributes #541

Closed ggalibert closed 4 years ago

ggalibert commented 5 years ago

The revised dt_start and dt_end columns are now published at http://oceancurrent.imos.org.au/timeseries/ with a note headed 26 April, saying "There are now more zeros in those columns, indicating correspondence of the dates in the filenames with either the range of the time vector, the period of good data, or both." I think we need to clarify the purpose of those dates in the filenames (and make sure all are the same format - that's what triggered the error). -- @DavidGriffin1

ggalibert commented 5 years ago

Since version 2.4 of the toolbox, the first couple of dates in the file name correspond to the time_coverage_start/end attributes, that is to say the first and last dates for which the dataset holds any data. The last date in the filename is the creation date.

For example in http://thredds.aodn.org.au/thredds/dodsC/IMOS/ANMN/NSW/CH100/Temperature/IMOS_ANMN-NSW_TZ_20181029T130000Z_CH100_FV01_CH100-1811-Aqualogger-520T-91.5_END-20190211T033500Z_C-20190212T050608Z.nc.html

[...]
time_coverage_end: 2019-02-11T03:35:00Z
time_coverage_start: 2018-10-29T13:00:00Z
date_created: 2019-02-12T05:06:08Z
[...]
TIME = "2018-10-29 13:00:0.000007", [...], "2019-02-11 03:35:0.000006" ;

See File naming conventions and IMOS NetCDF User Manual.

For toolbox version between 2.1b and 2.3b included, (see https://github.com/aodn/imos-toolbox/commit/6a3ea4945ca02575429ec57cf25de07ca4c96a51#diff-fc1f626054899532e812d9832958e3ca) it could have been either time_deployment_start/end or time_coverage_start/end if the former were not defined.

So the issue is a re-processing one of historical datasets processed with toolbox 2.1b to 2.3b included. From version 2.4, the toolbox is doing the right thing.

Will open a new issue in the anmn-internal-discussion repo for the record.

ggalibert commented 5 years ago

See https://github.com/aodn/anmn-internal-discussions/issues/55

DavidGriffin1 commented 5 years ago

Guillaume, In https://s3-ap-southeast-2.amazonaws.com/content.aodn.org.au/Documents/IMOS/Conventions/IMOS_NetCDF_File_Naming_Convention.pdf it says

: start date and time of the measurement, So I disagree with “From version 2.4, the toolbox is doing the right thing” because time_coverage does NOT refer to the “times of measurement”. The instruments are always turned on before deployment, and in the intervening time they are recording something, but they are not recording mission data (i.e. corresponding to the names in the variables) that is of much interest to users. The pre- and post deployment data is only of interest for quality control, so the dates advertising the switch-on and –off times of instruments do not deserve prominence by being in the filenames.
ggalibert commented 5 years ago

The pre- and post deployment data is only of interest for quality control, so the dates advertising the switch-on and –off times of instruments do not deserve prominence by being in the filenames.

This is why they are in the FV00 and FV01 filenames. FV00 and FV01 files are more for power users who want to have a closer look at the original data and at the QC. For the general public the moorings long timeseries working group is working to define and produce FV02 products (hourly averaged on common timestamps, vertically interpolated, etc...) that will help foster uptake and impact of IMOS mooring datasets. For these products the dates in the filename will match the mission/good data since only good data from FV01 will be included.

DavidGriffin1 commented 5 years ago

Guillaume, I think it is very important to adhere to the IMOS file naming conventions. Are you saying that that those conventions do not apply to the FV00 and FV01 files? I do not believe that they apply only to the FV02 files, because these new files are a new initiative, which, by the way, are not designed for the general public, any more than FV01 are designed for 'power users'. At the risk of stating the obvious, the IMOS mandate is to provide data for researchers. Researchers are not interested in out-of-water data, if that is what you imply by 'power user'.

ggalibert commented 5 years ago

My understanding was that a "measurement" is what is found in the instrument file whether it was in or out of the water.

Happy to discuss and revisit the current filenaming if the community feels it is not right. The current one makes it a bit easier to manage different versions of the same dataset: it will always have the same time_coverage_start/end while the time_deployment_start/end is subject to typos/errors and can be updated.

Power users or expert QC users might want to make sure that out-of-water data has been QC'd properly which they can only do with the FV01. I assumed a wider audience just wants the "good" data.

DavidGriffin1 commented 5 years ago

I think it is pretty clear that the term 'measurement' refers to the quantity that the instrument is designed to measure. The is why the variable 'UCUR' is named 'sea water velocity' rather than a broader term to describe both in-water and out-of-data situations. I think it is good, maybe essential to include the out-of-water data in the FV00 files, and OK also for FV01, but both should have the consistent name that reflects the time of 'measurement' data, that is most accurately known to the people who deployed and retrieved the instrument.

petejan commented 5 years ago

I have been thinking, and it’s a change with wider impacts, but use a different QC flag for out of water data, like value of 6. This makes the in water (valid) – out of water (invalid) data easier to separate.

The in-water-out-of-water QC test would then set this value instead of just marking at as BAD data, this also has the advantage that statistics around the GOOD/BAD data during deployment are easier to calculate.

Ocean Sites were ok with this proposal, as it can be described in the metadata in the netCDF file, but don’t generally include the out of water (invalid) data in the QCd data.

ocehugo commented 5 years ago

Questions:

  1. I assumed that a new re-processing file list is required. If so, where should I start looking? I would appreciate a starting tip.

  2. My understanding so far is that time_coverage_start/end suffered a definition change along the way, and now we need to reconcile the files. I assume the consensus is in/out water, but please advise if this is still pending.

  3. Have anyone projected possible breaks with this definition changes? For example, should a new qcflag be defined as commented above? Should we allow a time_coverage_start that differs from the actual TIME[0] index? Reprocessing will need manual steps, or can it be a batch job?

PS: I see meaning in storing both times. IMO, the time_coverage_start should be the in/out water, since this is what users expect. We could save something like record_coverage_start if not store already somewhere. This can be of good value for debugging and for actual provenance of when the sensor was on/off.

ggalibert commented 5 years ago
  1. Basically everything. In other cases, I can show you how to find a list of files produced by a certain version of the toolbox.

  2. That's right.

  3. The new QC flag suggestion should be discussed in a separate issue. time_coverage_start can differ from TIME[0] index. Re-processing is separate bigger problem that should be discussed outside this thread.

ocehugo commented 4 years ago

quick update:

Here is a table with the files that will need reprocessing (about ~1000 files): 541.zip

Apart from some files with no toolbox_version indication, most of the files that got unmatched filename dates/deployment_dates were created with versions from 2.5.3 to 2.5.42.

We only need to recreate those files, check versions, and re-upload when #614 is merged/a new version is tagged.

sspagnol commented 4 years ago

Why can't you rename them on the AODN side?

ocehugo commented 4 years ago

@sspagnol - I'm just reporting the blame list. The list is here to define what should be the approach.

The right file pilgrimage would be to go through the toolbox again, receive a new version tag, and then put inside the aodn infrastructure pipes.

This may be impractical, problematic, and impossible (particular for old files). However, some old files would benefit from passing through the new toolbox versions. Even more after #522,#554 is implemented/merged.