Closed mhidas closed 3 years ago
For example, the timestamps in this file are fine (monotonically increasing), but in the database, they look like this (sorted by array index):
index | TIME | DEPTH | TEMP
--------+------------------------+-----------+---------
900 | 2018-03-31 16:00:01+00 | -0.206684 | 22.3705
901 | 2018-03-31 16:01:01+00 | -0.206684 | 22.3708
902 | 2018-03-31 16:02:01+00 | -0.206684 | 22.3725
903 | 2018-03-31 16:03:01+00 | -0.202709 | 22.3733
904 | 2018-03-31 16:04:01+00 | -0.192773 | 22.3729
905 | 2018-03-31 16:05:01+00 | -0.197741 | 22.3724
906 | 2018-03-31 16:06:01+00 | -0.203703 | 22.3709
907 | 2018-03-31 16:07:01+00 | -0.200722 | 22.3688
...
958 | 2018-03-31 16:58:01+00 | -0.207678 | 22.1443
959 | 2018-03-31 16:59:01+00 | -0.215627 | 22.1386
==> 960 | 2018-03-31 16:00:01+00 | -0.212646 | 22.1347
961 | 2018-03-31 16:01:01+00 | -0.219602 | 22.1299
962 | 2018-03-31 16:02:01+00 | -0.210659 | 22.1257
963 | 2018-03-31 16:03:01+00 | -0.215627 | 22.1227
964 | 2018-03-31 16:04:01+00 | -0.219602 | 22.1202
965 | 2018-03-31 16:05:01+00 | -0.219602 | 22.117
966 | 2018-03-31 16:06:01+00 | -0.20569 | 22.1145
967 | 2018-03-31 16:07:01+00 | -0.210659 | 22.112
Note that after 16:59:01 (index 959) it goes back to 16:00:01.
This means two things:
This is a content issue, but it might need to be resolved by the devs as I think it is a systematic issue with the harvesting process or the way the db is set up.
Half of all timestamps (possibly in the entire db) are wrong by an hour!
Ok, upon closer investigation, it seems this was a gross overestimate (phew!). Comparing more of the timestamps from the above file (using ncdump) versus the db, the two are only inconsistent during that one-hour period (index range 900 to 960). The db values appear to match those in the file for all other times.
Looks like an issue with NetCDFUtils.addDays. If I change the type of the TIME output from iNetcdfInput to Date and remove the usage of NetCDFUtils.addDays, I get the correct result.
This routine is used by the following harvesters:
AATAMS_SATTAG_NRT AATAMS_SATTAG_QC_CTD ABOS_SOFS_SURFACE_FLUXES ABOS_SOFS_SURFACE_PROPERTIES ABOS_TS_SINGLE_INST_TIMESERIES ANFOG_DM ANFOG_RT ANMN_ACIDIFICATION_DM ANMN_ACIDIFICATION_NRT ANMN_BURST_AVG_TIMESERIES ANMN_MHLWAVE ANMN_NRS_CTD_PROFILES ANMN_NRS_DAR_YON_TS ANMN_NRS_LONG_TS ANMN_NRS_RT_BIOGEOCHEM_TIMESERIES ANMN_NRS_RT_METEO_TIMESERIES ANMN_NRS_RT_WAVE_TIMESERIES ANMN_T_REGRIDDED ANMN_TS_TIMESERIES ANMN_WAVE AODN_DSTO AODN_IMAS_FLUOROMETRY AODN_MHL_SST AODN_RAN_CTD AODN_RAN_SST AODN_WAVE_DM ARGO AUV FAIMMS FUTURE_REEF_MAP GSLA_DM00 GSLA_NRT00 NOAA_DRIFTERS SOOP_ASF_FMT SOOP_ASF_MT SOOP_BA SOOP_CO2 SOOP_CO2_RT SOOP_SST SOOP_TMV SOOP_TRV SRS_OC_BODBAW SRS_OC_SOOP_RAD SRS_SST
Looks like it calculates the time incorrectly between 3am and 4am on Australian (the pipeline servers timezone) daylight savings end days (adds an hour).
Looks like the issue is caused by not specifying timezone information in the NetCDFUtils.addDays call.
The harvester (and other harvesters) use
NetCDFUtils.addDays("1950-01-01 00:00:00", "yyyy-MM-dd HH:mm:ss", data.TIME, true)
What's required is
NetCDFUtils.addDays("1950-01-01 00:00:00-0000", "yyyy-MM-dd HH:mm:ssZ", data.TIME, true)
Need a bulk update of affected harvesters and to reprocess affected files.
Affected dates/times:
1972-02-26 15:00:00 - 15:59:59
1973-03-03 15:00:00 - 15:59:59
1974-03-02 15:00:00 - 15:59:59
1975-03-01 15:00:00 - 15:59:59
1976-03-06 15:00:00 - 15:59:59
1977-03-05 15:00:00 - 15:59:59
1978-03-04 15:00:00 - 15:59:59
1979-03-03 15:00:00 - 15:59:59
1980-03-01 15:00:00 - 15:59:59
1981-02-28 15:00:00 - 15:59:59
1982-04-03 15:00:00 - 15:59:59
1983-03-05 15:00:00 - 15:59:59
1984-03-03 15:00:00 - 15:59:59
1985-03-02 15:00:00 - 15:59:59
1986-03-15 15:00:00 - 15:59:59
1987-03-14 15:00:00 - 15:59:59
1988-03-19 15:00:00 - 15:59:59
1989-03-18 15:00:00 - 15:59:59
1990-03-03 15:00:00 - 15:59:59
1991-03-02 15:00:00 - 15:59:59
1992-02-29 15:00:00 - 15:59:59
1993-03-06 15:00:00 - 15:59:59
1994-03-05 15:00:00 - 15:59:59
1995-03-04 15:00:00 - 15:59:59
1996-03-30 15:00:00 - 15:59:59
1997-03-29 15:00:00 - 15:59:59
1998-03-28 15:00:00 - 15:59:59
1999-03-27 15:00:00 - 15:59:59
2000-03-25 15:00:00 - 15:59:59
2001-03-24 15:00:00 - 15:59:59
2002-03-30 15:00:00 - 15:59:59
2003-03-29 15:00:00 - 15:59:59
2004-03-27 15:00:00 - 15:59:59
2005-03-26 15:00:00 - 15:59:59
2006-04-01 15:00:00 - 15:59:59
2007-03-24 15:00:00 - 15:59:59
2008-04-05 15:00:00 - 15:59:59
2009-04-04 15:00:00 - 15:59:59
2010-04-03 15:00:00 - 15:59:59
2011-04-02 15:00:00 - 15:59:59
2012-03-31 15:00:00 - 15:59:59
2013-04-06 15:00:00 - 15:59:59
2014-04-05 15:00:00 - 15:59:59
2015-04-04 15:00:00 - 15:59:59
2016-04-02 15:00:00 - 15:59:59
2017-04-01 15:00:00 - 15:59:59
2018-03-31 15:00:00 - 15:59:59
Created pull request https://github.com/aodn/harvesters/pull/714 to fix incorrect time calculation. Problem occurred when rounding to nearest second using Calendar instance. Specifying the time zone does not fix the problem as the routine was assuming UTC dates were being passed.
It would be useful to know for each impacted harvester what are the data files that were affected so that they can be harvested again after harvesters are fixed by above PR. @jonescc do you think this list could be programmaticaly generated?
Would not be more strict to add a test/schema for time monotonicity for the respective datasets ? Other non-monotonic datasets can also be handled by adding a restriction on the number of non-monotonic intervals.
@ggalibert, yes it should be possible to query the db to identify files which should be reprocessed where reprocessing all is not practical. Just need to identify potentially problematic timestamps in the db and relate them back to the files from which they came. I'll have a look at an example query.
@ocehugo I believe monotonicity checks are included in the compliance checker. It may be worthwile adding some checks to the database as well.
@jonescc,
I don't know about the DB schemas or even how to implement it, but one thing that I think is a fact is that the DB schemas are probably not as complete as can be coded in normal conditions.
Thus, it's not better to create a round-trip workflow to check input/output mismatch?
This could be as simple as converting the results back from the DB, writing to a file copy, and running the checker.
A full round-trip could also be posed but i'm limited in terms of the db scope and fields. Something like read_orignal->compliance_checker->post_to_db->read_from_db->create_file_from_db_fields->compliance_checker-> write entry in db as "verified/exportable to IMOS netCDF file".
yes, that makes sense, but requires a substantial amount of work to do the create_file_from_db_fields step - we recently decommissioned a service which did this due to the amount of work required to support it - we only had it working for one collection only
sounds like a backlog item is required to look at how we can validate the db against the source data
@jonescc
I ran the following sql as an example of the soop tmv nrt collection. Do you feel like this is the most efficient way of doing this ?
SELECT i.url FROM soop_tmv.indexed_file i WHERE i.id IN (
SELECT distinct(file_id) FROM soop_tmv.soop_tmv_nrt_trajectory_data
WHERE
(date("TIME") = '1972-02-26' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1973-03-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1974-03-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1975-03-01' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1976-03-06' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1977-03-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1978-03-04' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1979-03-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1980-03-01' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1981-02-28' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1982-04-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1983-03-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1984-03-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1985-03-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1986-03-15' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1987-03-14' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1988-03-19' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1989-03-18' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1990-03-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1991-03-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1992-02-29' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1993-03-06' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1994-03-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1995-03-04' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1996-03-30' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1997-03-29' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1998-03-28' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1999-03-27' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2000-03-25' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2001-03-24' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2002-03-30' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2003-03-29' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2004-03-27' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2005-03-26' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2006-04-01' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2007-03-24' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2008-04-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2009-04-04' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2010-04-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2011-04-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2012-03-31' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2013-04-06' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2014-04-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2015-04-04' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2016-04-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2017-04-01' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2018-03-31' AND date_part('hour', "TIME") = 16 )
);
┌───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ url │
├───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┤
│ IMOS/SOOP/SOOP-TMV/VLST_Spirit-of-Tasmania-1/realtime/transect/10secs/2015/IMOS_SOOP-TMV_TSUB_20150404T084620Z_VLST_FV00_transect-D2M_END-20150404T191630Z.nc │
│ IMOS/SOOP/SOOP-TMV/VLST_Spirit-of-Tasmania-1/realtime/transect/10secs/2014/IMOS_SOOP-TMV_TSUB_20140405T083200Z_VLST_FV00_transect-M2D_END-20140405T194410Z.nc │
│ IMOS/SOOP/SOOP-TMV/VLST_Spirit-of-Tasmania-1/realtime/transect/10secs/2017/IMOS_SOOP-TMV_TSUB_20170401T103640Z_VLST_FV00_transect-D2M_END-20170401T201650Z.nc │
│ IMOS/SOOP/SOOP-TMV/VLST_Spirit-of-Tasmania-1/realtime/transect/10secs/2018/IMOS_SOOP-TMV_TSUB_20180331T081450Z_VLST_FV00_transect-M2D_END-20180331T191930Z.nc │
│ IMOS/SOOP/SOOP-TMV/VLST_Spirit-of-Tasmania-1/realtime/transect/1sec/2017/IMOS_SOOP-TMV_TSUB_20170401T103638Z_VLST_FV00_transect-D2M_END-20170401T201650Z.nc │
│ IMOS/SOOP/SOOP-TMV/VLST_Spirit-of-Tasmania-1/realtime/transect/1sec/2018/IMOS_SOOP-TMV_TSUB_20180331T081456Z_VLST_FV00_transect-M2D_END-20180331T191930Z.nc │
│ IMOS/SOOP/SOOP-TMV/VLST_Spirit-of-Tasmania-1/realtime/transect/1sec/2015/IMOS_SOOP-TMV_TSUB_20150404T084647Z_VLST_FV00_transect-D2M_END-20150404T191630Z.nc │
│ IMOS/SOOP/SOOP-TMV/VLST_Spirit-of-Tasmania-1/realtime/transect/1sec/2014/IMOS_SOOP-TMV_TSUB_20140405T083151Z_VLST_FV00_transect-M2D_END-20140405T194410Z.nc │
└───────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘
(8 rows)
Time: 149158.097 ms
Below is a list of affected collections, the SQL query ran to find url to repush to the incoming directory and the list of url to push back. @aodn/emii-ops Please tick the box once a collection is cleaned
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files: time_issue_anmn_nrs_dar_yon_url_ls.txt
bash command to reprocess
sudo_project_officer
cd $WIP_DIR/ANMN/NRS_AIMS_Darwin_Yongala_data_rss_download_temporary
mkdir manifest_dir_timezone_reprocess && cd manifest_dir_timezone_reprocess/
umask 002
for f in `cat time_issue_anmn_nrs_dar_yon_url_ls.txt`; do
wget https://s3-ap-southeast-2.amazonaws.com/imos-data/$f;
filename=`basename $f`;
md5_val=`md5sum $filename | awk '{ print $1 }'`;
mv $filename ${filename%.*}.$md5_val.nc;
done
pwd > /mnt/ebs/incoming/AODN/ANMN_NRS_DAR_YON/timezone_reprocess.dir_manifest
tail -f /mnt/ebs/log/pipeline/process/tasks.ANMN_NRS_DAR_YON.log
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files:
list of affected files: time_issue_aodn_dsto.txt
list of affected files: time_issue_aodn_imas_fluorometry.txt
list of affected files: time_issue_aodn_mhl_sst.txt
\o /tmp/time_issue_aodn_ran_ctd.txt;
SELECT url FROM aodn_ran_ctd.indexed_file where id IN (
SELECT DISTINCT(file_id) FROM aodn_ran_ctd.measurements
WHERE
(date("time") = '1972-02-26' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1973-03-03' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1974-03-02' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1975-03-01' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1976-03-06' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1977-03-05' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1978-03-04' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1979-03-03' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1980-03-01' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1981-02-28' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1982-04-03' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1983-03-05' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1984-03-03' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1985-03-02' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1986-03-15' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1987-03-14' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1988-03-19' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1989-03-18' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1990-03-03' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1991-03-02' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1992-02-29' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1993-03-06' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1994-03-05' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1995-03-04' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1996-03-30' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1997-03-29' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1998-03-28' AND date_part('hour', "time") = 16 ) OR
(date("time") = '1999-03-27' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2000-03-25' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2001-03-24' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2002-03-30' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2003-03-29' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2004-03-27' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2005-03-26' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2006-04-01' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2007-03-24' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2008-04-05' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2009-04-04' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2010-04-03' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2011-04-02' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2012-03-31' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2013-04-06' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2014-04-05' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2015-04-04' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2016-04-02' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2017-04-01' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2018-03-31' AND date_part('hour', "time") = 16 ) OR
(date("time") = '2019-04-06' AND date_part('hour', "time") = 16 )
OR
(date("time") = '1972-02-26' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1973-03-03' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1974-03-02' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1975-03-01' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1976-03-06' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1977-03-05' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1978-03-04' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1979-03-03' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1980-03-01' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1981-02-28' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1982-04-03' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1983-03-05' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1984-03-03' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1985-03-02' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1986-03-15' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1987-03-14' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1988-03-19' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1989-03-18' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1990-03-03' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1991-03-02' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1992-02-29' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1993-03-06' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1994-03-05' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1995-03-04' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1996-03-30' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1997-03-29' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1998-03-28' AND date_part('hour', "time") = 17 ) OR
(date("time") = '1999-03-27' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2000-03-25' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2001-03-24' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2002-03-30' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2003-03-29' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2004-03-27' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2005-03-26' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2006-04-01' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2007-03-24' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2008-04-05' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2009-04-04' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2010-04-03' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2011-04-02' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2012-03-31' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2013-04-06' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2014-04-05' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2015-04-04' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2016-04-02' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2017-04-01' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2018-03-31' AND date_part('hour', "time") = 17 ) OR
(date("time") = '2019-04-06' AND date_part('hour', "time") = 17 )
);
list of affected files: time_issue_aodn_ran_ctd.txt
list of affected files:
list of affected files: time_issue_aodn_wave_dm.txt
list of affected files: time_issue_argo_url_ls.txt
list of affected files: NONE since there aren't any data collected during night time
list of affected files: time_issue_faimms_url_ls.txt
bash command to reprocess
## reprocess FAIMMS
sudo_project_officer
cd $WIP_DIR/FAIMMS/REALTIME/
mkdir manifest_dir_timezone_reprocess && cd manifest_dir_timezone_reprocess/
umask 002
for f in `cat time_issue_faimms_url_ls.txt`; do
wget https://s3-ap-southeast-2.amazonaws.com/imos-data/$f;
filename=`basename $f`;
md5_val=`md5sum $filename | awk '{ print $1 }'`;
mv $filename ${filename%.*}.$md5_val.nc;
done
pwd > /mnt/ebs/incoming/FAIMMS/timezone_reprocess.dir_manifest
tail -f /mnt/ebs/log/pipeline/process/tasks.FAIMMS.log
list of affected files: time_issue_future_reef_map.txt
list of affected files: time_issue_gsla_dm00.txt
list of affected files: time_issue_gsla_nrt00.txt
list of affected files: time_issue_noaa_drifters_ls.txt
list of affected files: time_issue_soop_asf_fmt_url_ls.txt
list of affected files: time_issue_soop_asf_mt_url_ls.txt
list of affected files: time_issue_soop_ba_url_ls.txt
list of affected files: NONE time_issue_soop_c02_rt_url_ls.txt
list of affected files: time_issue_soop_c02_url_ls.txt
list of affected files: time_issue_soop_sst.txt
\o /tmp/time_issue_soop_tmv.txt;
SELECT url FROM soop_tmv.indexed_file where id IN (
SELECT DISTINCT(file_id) FROM soop_tmv.measurements
WHERE
(date("TIME") = '1972-02-26' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1973-03-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1974-03-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1975-03-01' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1976-03-06' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1977-03-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1978-03-04' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1979-03-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1980-03-01' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1981-02-28' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1982-04-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1983-03-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1984-03-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1985-03-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1986-03-15' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1987-03-14' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1988-03-19' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1989-03-18' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1990-03-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1991-03-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1992-02-29' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1993-03-06' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1994-03-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1995-03-04' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1996-03-30' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1997-03-29' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1998-03-28' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '1999-03-27' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2000-03-25' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2001-03-24' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2002-03-30' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2003-03-29' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2004-03-27' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2005-03-26' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2006-04-01' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2007-03-24' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2008-04-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2009-04-04' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2010-04-03' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2011-04-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2012-03-31' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2013-04-06' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2014-04-05' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2015-04-04' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2016-04-02' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2017-04-01' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2018-03-31' AND date_part('hour', "TIME") = 16 ) OR
(date("TIME") = '2019-04-06' AND date_part('hour', "TIME") = 16 )
OR
(date("TIME") = '1972-02-26' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1973-03-03' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1974-03-02' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1975-03-01' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1976-03-06' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1977-03-05' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1978-03-04' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1979-03-03' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1980-03-01' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1981-02-28' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1982-04-03' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1983-03-05' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1984-03-03' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1985-03-02' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1986-03-15' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1987-03-14' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1988-03-19' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1989-03-18' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1990-03-03' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1991-03-02' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1992-02-29' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1993-03-06' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1994-03-05' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1995-03-04' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1996-03-30' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1997-03-29' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1998-03-28' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '1999-03-27' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2000-03-25' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2001-03-24' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2002-03-30' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2003-03-29' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2004-03-27' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2005-03-26' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2006-04-01' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2007-03-24' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2008-04-05' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2009-04-04' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2010-04-03' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2011-04-02' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2012-03-31' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2013-04-06' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2014-04-05' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2015-04-04' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2016-04-02' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2017-04-01' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2018-03-31' AND date_part('hour', "TIME") = 17 ) OR
(date("TIME") = '2019-04-06' AND date_part('hour', "TIME") = 17 )
);
list of affected files: time_issue_soop_tmv.txt
list of affected files: time_issue_soop_trv.txt
list of affected files:
list of affected files: time_issue_srs_oc_soop_rad.txt
list of affected files:
I was initially worried about the use of date and date_trunc after our recent issues using these but in this case where you're operating against a timestamp without time zone it seems to work OK. Will also probably work with timestamp with timezone columns as the time zone of the db is set to UTC (but check!!)
I couldn't find a faster way of getting the results - with this many filter conditions postgres decides to do a sequential scan no matter what.
PS. we should probably add in the recent daylight saving change. May be some datasets with those times now.
So that's date/times between 2019-04-06 16:00:00 and 2019-04-06 17:00:00 are also suspect
@mhidas could you have a look at this https://github.com/aodn/content/issues/416#issuecomment-501581043 and check which of your files need to be reprocessed? I'd rather leave that to you as ANMN is a complicated one
# wget for imos buck by simply giving the relative path of an object without the full bucket address
# $1 imos bucket url
# $2 output_folder (optional)
wget_imos_bucket() {
local url_suffix=$1; shift;
local url_prefix="https://s3-ap-southeast-2.amazonaws.com/imos-data/"
local output_folder=$1;
[ ! -z $output_folder ] && wget ${url_prefix}${url_suffix} || \
wget -P $output_folder ${url_prefix}${url_suffix}
}
Obsolete schema
cd /mnt/imos-data
grep -oE 'ANMN/.*\.nc' /tmp/time_issue_anmn_burst_avg_timseries.txt | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
FYI @mhidas !!!! what is the incoming folder?
cd /mnt/imos-data
grep -oE 'Manly_Hydraulics_Laboratory/.*\.nc' /tmp/time_issue_anmn_mhlwave.txt | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
for f in `cat /tmp/time_issue_anmn_mhlwave.txt | grep -o -E "NSW-OEH/.*nc"`;do
wget_imos_bucket $f $INCOMING_DIR/
done
cd /mnt/imos-data
grep -oE 'ANMN/.*\.nc' /tmp/time_issue_anmn_nrs_ctd_profiles.txt | $HARVESTER_TRIGGER --stdin -b IMOS
cd /mnt/imos-data
grep -oE 'ANMN/.*\.nc' /tmp/time_issue_anmn_nrs_long_ts.txt | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
-> no files to be ingested according to SQL query
cd /mnt/imos-data
grep -oE 'ANMN/.*\.nc' /tmp/time_issue_anmn_nrs_rt_bio.txt | sort | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
cd /mnt/imos-data
grep -oE 'ANMN/.*\.nc' /tmp/time_issue_anmn_nrs_rt_meteo.txt | sort | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
cd /mnt/imos-data
grep -oE 'ANMN/.*\.nc' /tmp/time_issue_anmn_nrs_rt_wave_timeseries.txt | sort | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
obsolete
cd /mnt/imos-data
grep -oE 'ANMN/.*\.nc' /tmp/time_issue_anmn_ts_timeseries.txt | sort | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
FYI @mhidas also see https://github.com/aodn/content/issues/487
cd /mnt/imos-data
grep -oE 'ANMN/.*\.nc' /tmp/time_issue_anmn_wave.txt | grep -v REAL_TIME | sort | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
@bpasquer could you please review the following
cd /mnt/imos-data
grep -oE 'ANFOG/.*\.nc' /tmp/time_issue_anfog_dm.txt | sort | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
NO RT files matching the dates
Thanks @lbesnard
Firstly, I would suggest you don't download files from S3 and re-push them through the whole pipeline. This won't work for some of the older files as they will fail the compliance checks.
You can simply re-index them in place from the fuse mount. E.g. for your target list in /tmp/time_issue_x.txt
:
> cd /mnt/imos-data
> grep -oE 'ANMN/.*\.nc' /tmp/time_issue_x.txt | $HARVESTER_TRIGGER --stdin -b IMOS
ANMN_ACIDIFICATION_NRT
Please ignore this schema as it's obsolete (see https://github.com/aodn/PO-Backlog/issues/480) and should in fact be deleted. These NRT data are now being harvested into and served from the anmn_am_dm
schema (sorry the name is now inaccurate, but not worth the effort to rename).
ANMN_BURST_AVG_TIMESERIES
This only includes data collected by the ANMN facility, so no data before 2007 (see select min(time_coverage_start) from anmn_burst_avg.timeseries ;
). I expect you will speed up your query considerably by not searching for dates going back to 1972. Same applies to most of the other schemas below.
ANMN_MHLWAVE
I don't know much about this one, but it does seem to have data going back to 1974.
ANMN_NRS_CTD_PROFILES
Earliest data is in 2007.
ANMN_NRS_LONG_TS
There's only 2 files in the entire collection, so just reindex them both.
ANMN_NRS_RT_BIO ANMN_NRS_RT_METEO ANMN_NRS_RT_WAVE
These also only have a handful of files each (see the _map
view in each schema), so might as well just reindex them all rather than bother with the query.
(And btw the schemas are called anmn_nrs_rt_bio
, anmn_nrs_rt_meteo
, anmn_nrs_rt_wave
, and they do all have an indexed_file table.)
ANMN_T_REGRIDDED
This is obsolete and has been removed.
ANMN_TS_TIMESERIES ANMN_WAVE
Data starts in 2007, so no need to query before then.
thanks @mhidas for your feedback, just waiting for a query to run on the anmn_ts schema. Everything else is done.
cd /mnt/imos-data
grep -oE 'AATAMS/.*\.nc' /tmp/time_issue_aatams_sattag_qc_ctd.txt | sort | uniq | $HARVESTER_TRIGGER --stdin -b IMOS
ISSUE
Exception in component tPostgresqlRow_1 (DeleteProfile)
org.postgresql.util.PSQLException: ERROR: relation "profile_history" does not exist
Position: 13
at org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2096)
at org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:1829)
at org.postgresql.core.v3.QueryExecutorImpl.execute(QueryExecutorImpl.java:257)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:510)
at org.postgresql.jdbc2.AbstractJdbc2Statement.executeWithFlags(AbstractJdbc2Statement.java:372)
at org.postgresql.jdbc2.AbstractJdbc2Statement.execute(AbstractJdbc2Statement.java:364)
at aatams_sattag_qc_ctd.deleteprofile_0_1.DeleteProfile.tPostgresqlRow_1Process(DeleteProfile.java:743)
at aatams_sattag_qc_ctd.deleteprofile_0_1.DeleteProfile.tPostgresqlConnection_1Process(DeleteProfile.java:606)
at aatams_sattag_qc_ctd.deleteprofile_0_1.DeleteProfile.runJobInTOS(DeleteProfile.java:1738)
at aatams_sattag_qc_ctd.deleteprofile_0_1.DeleteProfile.runJob(DeleteProfile.java:1547)
at aatams_sattag_qc_ctd.harvestprofiles_0_1.HarvestProfiles.iDeletedFileList_1Process(HarvestProfiles.java:690)
at aatams_sattag_qc_ctd.harvestprofiles_0_1.HarvestProfiles.runJobInTOS(HarvestProfiles.java:1770)
at aatams_sattag_qc_ctd.harvestprofiles_0_1.HarvestProfiles.runJob(HarvestProfiles.java:1584)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.tRunJob_3Process(AATAMS_SATTAG_QC_CTD_harvester.java:2193)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.tRunJob_4Process(AATAMS_SATTAG_QC_CTD_harvester.java:1952)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.iUpdateIndex_1Process(AATAMS_SATTAG_QC_CTD_harvester.java:1606)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.runJobInTOS(AATAMS_SATTAG_QC_CTD_harvester.java:4833)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.main(AATAMS_SATTAG_QC_CTD_harvester.java:4559)
Exception in component tRunJob_2 (HarvestProfiles)
java.lang.RuntimeException: Child job running failed.
org.postgresql.util.PSQLException: ERROR: relation "profile_history" does not exist
Position: 13
at aatams_sattag_qc_ctd.harvestprofiles_0_1.HarvestProfiles.iDeletedFileList_1Process(HarvestProfiles.java:707)
at aatams_sattag_qc_ctd.harvestprofiles_0_1.HarvestProfiles.runJobInTOS(HarvestProfiles.java:1770)
at aatams_sattag_qc_ctd.harvestprofiles_0_1.HarvestProfiles.runJob(HarvestProfiles.java:1584)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.tRunJob_3Process(AATAMS_SATTAG_QC_CTD_harvester.java:2193)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.tRunJob_4Process(AATAMS_SATTAG_QC_CTD_harvester.java:1952)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.iUpdateIndex_1Process(AATAMS_SATTAG_QC_CTD_harvester.java:1606)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.runJobInTOS(AATAMS_SATTAG_QC_CTD_harvester.java:4833)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.main(AATAMS_SATTAG_QC_CTD_harvester.java:4559)
Exception in component tRunJob_3 (AATAMS_SATTAG_QC_CTD_harvester)
java.lang.RuntimeException: Child job running failed.
java.lang.RuntimeException: Child job running failed.
org.postgresql.util.PSQLException: ERROR: relation "profile_history" does not exist
Position: 13
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.tRunJob_3Process(AATAMS_SATTAG_QC_CTD_harvester.java:2210)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.tRunJob_4Process(AATAMS_SATTAG_QC_CTD_harvester.java:1952)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.iUpdateIndex_1Process(AATAMS_SATTAG_QC_CTD_harvester.java:1606)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.runJobInTOS(AATAMS_SATTAG_QC_CTD_harvester.java:4833)
at aatams_sattag_qc_ctd.aatams_sattag_qc_ctd_harvester_0_1.AATAMS_SATTAG_QC_CTD_harvester.main(AATAMS_SATTAG_QC_CTD_harvester.java:4559)
2021-07-28 17:22:00;vI7VVs;vI7VVs;vI7VVs;17135;AATAMS_SATTAG_QC_CTD;AATAMS_SATTAG_QC_CTD_harvester;_yAc4sDzlEeKIJ9HE2ojUMg;0.1;Default;;begin;;
2021-07-28 17:22:06;vI7VVs;vI7VVs;vI7VVs;17135;AATAMS_SATTAG_QC_CTD;AATAMS_SATTAG_QC_CTD_harvester;_yAc4sDzlEeKIJ9HE2ojUMg;0.1;Default;;end;failure;5624
Talend harvester aatams_sattag_qc_ctd-aatams_sattag_qc_ctd: 62 file(s) FAILED
Aborting operation
I'm not sure about this harvester/schema. @ggalibert do you know if the aatams_sattag_qc_ctd schema/harvester should be revoked?
@ggalibert @DelphiWard
The only issue I have is with the AATAMS_SATTAG_QC_CTD harvester, see comments above.
The associated data is on S3 http://imos-data.s3-website-ap-southeast-2.amazonaws.com/?prefix=IMOS/AATAMS/satellite_tagging/MEOP_QC_CTD/
and on the portal : Satellite Relay Tagging Program - Southern Ocean - MEOP Quality Controlled CTD Profiles https://portal.aodn.org.au/search?uuid=95d6314c-cfc7-40ae-b439-85f14541db71
I don't know much about this dataset. The last data of data available is 2017-08-21 23:00:00+00
I'm actually wondering if this collection should exist or not. Is it replaced by something else?
Currently, the harvester is not capable of re-harvesting some files. Some debugging is required. I don't want to investigate and fix this harvester if the collection shouldn't exist.
The collection is valid. It's a dataset produced sporadically by Fabien Roquet (he's based in Sweden). Last time he generated the files was in 2017. He hasn't been allocated funds to process the dataset since then.
see PR fixing the harvester in order to re-process the data
modidying the AATAMS SATTAG CTD harvester made possible to re-harvest files.
All data is now good accross all schemas affected
It seems there are timestamps in the harvest db that are off by an hour due to something in the harvest process.
E.g. Craig just discovered that the
anmn_ts.measurements
table contains many duplicate timestamps within a given timeseries (i.e. one file). This should not be possible as the IMOS-1.4 checker requires the TIME coordinate to be strictly monotonic. A closer look at one example revealed that indeed the timestamps are correct (unique) in the file, but the equivalend timestamps in the db contain an hour's worth of duplicate values, pretty much at the time daylight savings ended.This suggests that even though the timestamps in the netCDF are UTC, somewhere along the way, a daylight savings adjustment gets applied, incorrectly.