Closed ggalibert closed 4 years ago
Since version 2.4 of the toolbox, the first couple of dates in the file name correspond to the time_coverage_start/end attributes, that is to say the first and last dates for which the dataset holds any data. The last date in the filename is the creation date.
[...]
time_coverage_end: 2019-02-11T03:35:00Z
time_coverage_start: 2018-10-29T13:00:00Z
date_created: 2019-02-12T05:06:08Z
[...]
TIME = "2018-10-29 13:00:0.000007", [...], "2019-02-11 03:35:0.000006" ;
See File naming conventions and IMOS NetCDF User Manual.
For toolbox version between 2.1b and 2.3b included, (see https://github.com/aodn/imos-toolbox/commit/6a3ea4945ca02575429ec57cf25de07ca4c96a51#diff-fc1f626054899532e812d9832958e3ca) it could have been either time_deployment_start/end or time_coverage_start/end if the former were not defined.
So the issue is a re-processing one of historical datasets processed with toolbox 2.1b to 2.3b included. From version 2.4, the toolbox is doing the right thing.
Will open a new issue in the anmn-internal-discussion repo for the record.
Guillaume, In https://s3-ap-southeast-2.amazonaws.com/content.aodn.org.au/Documents/IMOS/Conventions/IMOS_NetCDF_File_Naming_Convention.pdf it says
The pre- and post deployment data is only of interest for quality control, so the dates advertising the switch-on and –off times of instruments do not deserve prominence by being in the filenames.
This is why they are in the FV00 and FV01 filenames. FV00 and FV01 files are more for power users who want to have a closer look at the original data and at the QC. For the general public the moorings long timeseries working group is working to define and produce FV02 products (hourly averaged on common timestamps, vertically interpolated, etc...) that will help foster uptake and impact of IMOS mooring datasets. For these products the dates in the filename will match the mission/good data since only good data from FV01 will be included.
Guillaume, I think it is very important to adhere to the IMOS file naming conventions. Are you saying that that those conventions do not apply to the FV00 and FV01 files? I do not believe that they apply only to the FV02 files, because these new files are a new initiative, which, by the way, are not designed for the general public, any more than FV01 are designed for 'power users'. At the risk of stating the obvious, the IMOS mandate is to provide data for researchers. Researchers are not interested in out-of-water data, if that is what you imply by 'power user'.
My understanding was that a "measurement" is what is found in the instrument file whether it was in or out of the water.
Happy to discuss and revisit the current filenaming if the community feels it is not right. The current one makes it a bit easier to manage different versions of the same dataset: it will always have the same time_coverage_start/end while the time_deployment_start/end is subject to typos/errors and can be updated.
Power users or expert QC users might want to make sure that out-of-water data has been QC'd properly which they can only do with the FV01. I assumed a wider audience just wants the "good" data.
I think it is pretty clear that the term 'measurement' refers to the quantity that the instrument is designed to measure. The is why the variable 'UCUR' is named 'sea water velocity' rather than a broader term to describe both in-water and out-of-data situations. I think it is good, maybe essential to include the out-of-water data in the FV00 files, and OK also for FV01, but both should have the consistent name that reflects the time of 'measurement' data, that is most accurately known to the people who deployed and retrieved the instrument.
I have been thinking, and it’s a change with wider impacts, but use a different QC flag for out of water data, like value of 6. This makes the in water (valid) – out of water (invalid) data easier to separate.
The in-water-out-of-water QC test would then set this value instead of just marking at as BAD data, this also has the advantage that statistics around the GOOD/BAD data during deployment are easier to calculate.
Ocean Sites were ok with this proposal, as it can be described in the metadata in the netCDF file, but don’t generally include the out of water (invalid) data in the QCd data.
Questions:
I assumed that a new re-processing file list is required. If so, where should I start looking? I would appreciate a starting tip.
My understanding so far is that time_coverage_start/end suffered a definition change along the way, and now we need to reconcile the files. I assume the consensus is in/out water, but please advise if this is still pending.
Have anyone projected possible breaks with this definition changes? For example, should a new qcflag be defined as commented above? Should we allow a time_coverage_start
that differs from the actual TIME[0] index? Reprocessing will need manual steps, or can it be a batch job?
PS: I see meaning in storing both times. IMO, the time_coverage_start should be the in/out water, since this is what users expect. We could save something like record_coverage_start
if not store already somewhere. This can be of good value for debugging and for actual provenance of when the sensor was on/off.
Basically everything. In other cases, I can show you how to find a list of files produced by a certain version of the toolbox.
That's right.
The new QC flag suggestion should be discussed in a separate issue. time_coverage_start
can differ from TIME[0]
index. Re-processing is separate bigger problem that should be discussed outside this thread.
quick update:
Here is a table with the files that will need reprocessing (about ~1000 files): 541.zip
Apart from some files with no toolbox_version indication, most of the files that got unmatched filename dates/deployment_dates were created with versions from 2.5.3 to 2.5.42.
We only need to recreate those files, check versions, and re-upload when #614 is merged/a new version is tagged.
Why can't you rename them on the AODN side?
@sspagnol - I'm just reporting the blame list. The list is here to define what should be the approach.
The right file pilgrimage would be to go through the toolbox again, receive a new version tag, and then put inside the aodn infrastructure pipes.
This may be impractical, problematic, and impossible (particular for old files). However, some old files would benefit from passing through the new toolbox versions. Even more after #522,#554 is implemented/merged.