aodn / content

Tracks AODN Portal content and configuration issues
0 stars 0 forks source link

ANMN WAVE - schema has reference to non existing files #487

Closed lbesnard closed 2 years ago

lbesnard commented 2 years ago

By the look of it, the ANMN_WAVE harvester which writes in the anmn_wave schema has references to the ANMN NRS wave data files available in http://imos-data.s3-website-ap-southeast-2.amazonaws.com/?prefix=IMOS/ANMN/NRS/REAL_TIME/NRSDAR.

These files are part of the anmn_nrs_dar_yon schema and harvester to the similar name. So I don't understand why there are part of the anmn_wave schema as well.

Anyway, the anmn_wave schema seems to have references to files which don't exist such as:

IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2014/QAQC/IMOS_ANMN_W_20141201T015928Z_FV01_END-20141231T225808Z_C-20150325T103021Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2010/QAQC/IMOS_ANMN_W_20100703T095512Z_FV01_END-20100731T235544Z_C-20150325T102955Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2010/QAQC/IMOS_ANMN_W_20100801T015512Z_FV01_END-20100831T235544Z_C-20150325T102955Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2013/QAQC/IMOS_ANMN_W_20130501T005944Z_FV01_END-20130531T225808Z_C-20150325T103009Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2013/QAQC/IMOS_ANMN_W_20130701T003824Z_FV01_END-20130731T233840Z_C-20150325T103010Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2013/QAQC/IMOS_ANMN_W_20130801T013808Z_FV01_END-20130831T233840Z_C-20150325T103011Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2015/QAQC/IMOS_ANMN_W_20150101T005944Z_FV01_END-20150131T225808Z_C-20150325T103022Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2015/QAQC/IMOS_ANMN_W_20150801T012936Z_FV01_END-20150901T000000Z_C-20150901T020228Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2016/QAQC/IMOS_ANMN_W_20160101T015928Z_FV01_END-20160201T000000Z_C-20160201T020554Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2016/QAQC/IMOS_ANMN_W_20160201T015928Z_NRSDAR_FV01_END-20160201T175928Z_C-20160201T200315Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2012/QAQC/IMOS_ANMN_W_20121209T160000Z_FV01_END-20121231T225808Z_C-20150325T103005Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2013/QAQC/IMOS_ANMN_W_20130101T005944Z_FV01_END-20130201T000000Z_C-20150325T103006Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2013/QAQC/IMOS_ANMN_W_20130201T015928Z_FV01_END-20130301T000000Z_C-20150325T103007Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2010/QAQC/IMOS_ANMN_W_20100901T015512Z_FV01_END-20100914T185456Z_C-20150325T102956Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2011/QAQC/IMOS_ANMN_W_20110102T105912Z_FV01_END-20110129T105912Z_C-20150325T102958Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2015/QAQC/IMOS_ANMN_W_20150501T002952Z_FV01_END-20150531T222816Z_C-20150601T180221Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2015/QAQC/IMOS_ANMN_W_20151101T005944Z_FV01_END-20151130T225808Z_C-20151201T170316Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2015/QAQC/IMOS_ANMN_W_20151201T005944Z_FV01_END-20160101T000000Z_C-20160101T020532Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2013/QAQC/IMOS_ANMN_W_20130301T015928Z_FV01_END-20130331T225808Z_C-20150325T103008Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2013/QAQC/IMOS_ANMN_W_20130901T013808Z_FV01_END-20130930T223856Z_C-20150325T103012Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2013/QAQC/IMOS_ANMN_W_20131001T003824Z_FV01_END-20131031T233840Z_C-20150325T103012Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2013/QAQC/IMOS_ANMN_W_20131101T013808Z_FV01_END-20131130T173808Z_C-20150325T103013Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2014/QAQC/IMOS_ANMN_W_20140124T035856Z_FV01_END-20140201T000000Z_C-20150325T103014Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2014/QAQC/IMOS_ANMN_W_20140201T015928Z_FV01_END-20140301T000000Z_C-20150325T103014Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2014/QAQC/IMOS_ANMN_W_20140301T015928Z_FV01_END-20140331T225808Z_C-20150325T103015Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2014/QAQC/IMOS_ANMN_W_20140401T005944Z_FV01_END-20140430T225808Z_C-20150325T103016Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2014/QAQC/IMOS_ANMN_W_20140501T005944Z_FV01_END-20140601T000000Z_C-20150325T103017Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2014/QAQC/IMOS_ANMN_W_20140601T015928Z_FV01_END-20140701T000000Z_C-20150325T103017Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2014/QAQC/IMOS_ANMN_W_20140701T015928Z_FV01_END-20140731T225808Z_C-20150325T103018Z.nc
IMOS/ANMN/NRS/REAL_TIME/NRSDAR/Peak_Wave_Period/peak_wave_period_channel_3001/2014/QAQC/IMOS_ANMN_W_20140801T005944Z_FV01_END-20140901T000000Z_C-20150325T103019Z.nc
....

It seems like this harvester doesn't clean data properly.

lbesnard commented 2 years ago

So those files were indexed in the anmn_wave schema. However they're not harvested because there content didn't match what the harvester expected.

Removing the entries from the harvester means that the chef-private databags has to be modified, since currently, these files don't match the regex (which was changed to be more specific). Currently the po_s3_del command wouldn't work. Alternatively, the harvester could be changed to remove these entries.

However, the reference to those files is only in the indexed_table. For proof, there is no data associated to these files in both the WMS and WFS

select * from anmn_wave.anmn_wave_map where file_id in (select id from anmn_wave.indexed_file where url like 'IMOS/ANMN/NRS/REAL_TIME/%' and not deleted);
SELECT 0

select * from anmn_wave.anmn_wave_data where file_id in (select id from anmn_wave.indexed_file where url like 'IMOS/ANMN/NRS/REAL_TIME/%' and not deleted)
SELECT 0

I'd suggest not to do anything and just close this issue.

FYI @ggalibert