aodn / content

Tracks AODN Portal content and configuration issues
0 stars 0 forks source link

SOOP XBT NRT - data in year 200 A.D. #413

Closed lbesnard closed 5 years ago

lbesnard commented 5 years ago

I discovered by chance 3+ xbt nrt voyages with times around the year 200 A.D. https://portal.aodn.org.au/search?uuid=35234913-aa3c-48ec-b9a4-77f822f66ef8

currently investigating the reasons why

3125 [147.5438, -43.529] Point geom soop_xbt_nrt_profiles_data.fid-7cb7f59a169dbea239e-42a FHZI 29 -43.5289655172 147.543793103 17.485 0200-02-20T12:09:00Z 698561 5471 Astrolabe Feature
lbesnard commented 5 years ago
select distinct(url) from  soop_xbt_nrt.soop_xbt_nrt_profiles_map where "TIME" < '1900-01-01T00:00:00Z' order by url

"IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000107T230600Z_9V3581_002415_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T215200Z_9V3581_002423_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T215200Z_9V3581_002430_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T220200Z_9V3581_002424_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T220200Z_9V3581_002431_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T230300Z_9V3581_002425_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T230300Z_9V3581_002432_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T232000Z_9V3581_002426_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T232000Z_9V3581_002433_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T235400Z_9V3581_002427_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000210T235400Z_9V3581_002434_FV00.csv" "IMOS/SOOP/SOOP-XBT/REALTIME/9V3581_Maersk-Jalan/200/IMOS_SOOP-XBT_T_2000211T004300Z_9V3581_002435_FV00.csv" ...

lbesnard commented 5 years ago

the reason seems to be in the filename not containing enough characters in the time value around the year/month part: 2000210T230300Z

lbesnard commented 5 years ago

This query shows it affected files uploaded in 2016-03

select max(last_indexed) from soop_xbt_nrt.indexed_file where url IN (select distinct(url) from  soop_xbt_nrt.soop_xbt_nrt_profiles_map where "TIME" < '1900-01-01T00:00:00Z' order by url)

@smancini do you know you dealt with this collection back then ? maybe we should contact the facility leader about this bug.

Also, the physical files aren't on S3. However I can always un-indexed them

FYI @ggalibert

lbesnard commented 5 years ago

I tried to delete all those files with the po_s3_del command. However the data is still in the database.

For example:

select * from soop_xbt_nrt.indexed_file where url like 'IMOS/SOOP/SOOP-XBT/REALTIME/FHZI_Astrolabe/200/IMOS_SOOP-XBT_T_2000221T041300Z_FHZI_003570_FV00.csv';

correctly shows it was deleted and last_indexed today "2019-05-02 02:42:23.828+00".

however the data is still there

select * from soop_xbt_nrt.soop_xbt_nrt_profiles_map where url like 'IMOS/SOOP/SOOP-XBT/REALTIME/FHZI_Astrolabe/200/IMOS_SOOP-XBT_T_2000221T041300Z_FHZI_003570_FV00.csv';

The SOOP_XBT_NRT harvester seems to be set up correctly to delete files:

"DELETE FROM attribute WHERE file_id = " + context.file_id + " ;
DELETE FROM measurements WHERE profile_id = " + context.file_id + " ;
DELETE FROM soop_xbt_nrt_profiles_map WHERE profile_id = " + context.file_id + " ;
DELETE FROM indexed_file where id = " + context.file_id + ";"

with a proper commit and tpostjob component. however, files stay in the _map table.

Would you have an idea @jonescc

jonescc commented 5 years ago

I'll have a look.

jonescc commented 5 years ago

The file was deleted a long time ago

select * from soop_xbt_nrt.file_harvest where file_id = 5478;

harvest=> select * from soop_xbt_nrt.file_harvest where file_id = 5478;
  id  | harvest_type | file_id | last_moddate_harvested | last_change_type | last_change_run 
------+--------------+---------+------------------------+------------------+-----------------
 5478 | harvest      |    5478 | 2016-03-01 04:00:54+00 | deleted          |             453
(1 row)

We are currently up to run 1771

harvest=> select * from soop_xbt_nrt.index_job;
 id | name  | last_run_no |       last_run_date        
----+-------+-------------+----------------------------
  1 | index |        1771 | 2019-05-02 02:44:45.811+00
(1 row)

Not sure what happened back then and why the records weren't cleaned up at the time. Regardless, you would still expect that the del command should trigger the delete action and remove the records.

Unfortunately, up until recently, the delete action would only be triggered if a delete hadn't already been performed (checked the deleted flag on indexed_file). This was fixed in https://github.com/aodn/talend-components/pull/60, but only harvesters with the problem being fixed by this PR were rebuilt with the updated components (the GENERIC_TIMESTEP harvester).

If you force a rebuild of the harvester to pick up these changes, then the s3 del command should work for this file.

jonescc commented 5 years ago

I kicked off a build for the harvester and have promoted it to RC for testing. Should be deployed shortly.

lbesnard commented 5 years ago

@jonescc

The new version of the harvester is on RC

talend_version soop_xbt_nrt-soop_xbt_nrt
GIT_COMMIT=a54b492319e94dec3504d2919fcf5f0b3c90c276
BUILD_URL=https://jenkins.aodn.org.au/job/talend_build/1372/
BUILD_TAG=jenkins-talend_build-1372
ROOT_BUILD_CAUSE=MANUALTRIGGER
BUILD_JOB_NAME=talend_build
BUILD_CAUSE_MANUALTRIGGER=true
BUILD_DISPLAY_NAME=#1372
BUILD_ID=1372
BUILD_CAUSE=MANUALTRIGGER
BUILD_NUMBER=1372
ROOT_BUILD_CAUSE_MANUALTRIGGER=true

I performed a po_s3_del

po_s3_del IMOS/SOOP/SOOP-XBT/REALTIME/FASB_Astrolabe/2017/IMOS_SOOP-XBT_T_20171227T200300Z_FASB_000450_FV00.csv
Deleting 'IMOS/SOOP/SOOP-XBT/REALTIME/FASB_Astrolabe/2017/IMOS_SOOP-XBT_T_20171227T200300Z_FASB_000450_FV00.csv' with index deletion
delete: 's3://imos-test-data/IMOS/SOOP/SOOP-XBT/REALTIME/FASB_Astrolabe/2017/IMOS_SOOP-XBT_T_20171227T200300Z_FASB_000450_FV00.csv'

however the file is still in the database (and set a deleted TRUE)

select * from soop_xbt_nrt.soop_xbt_nrt_profiles_map where url like 'IMOS/SOOP/SOOP-XBT/REALTIME/FASB_Astrolabe/2017/IMOS_SOOP-XBT_T_20171227T200300Z_FASB_000450_FV00.csv';                                                                
┌─[ RECORD 1 ]────┬───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ profile_id      │ 211                                                                                                   │
│ url             │ IMOS/SOOP/SOOP-XBT/REALTIME/FASB_Astrolabe/2017/IMOS_SOOP-XBT_T_20171227T200300Z_FASB_000450_FV00.csv │
│ Callsign        │ FASB                                                                                                  │
│ vessel_name     │ Astrolabe                                                                                             │
│ TIME            │ 2017-12-27 09:03:00+00                                                                                │
│ nb_measurements │ 30                                                                                                    │
│ max_depth       │ 48                                                                                                    │
│ colour          │ #155B90                                                                                               │
│ geom            │ 0101000020E61000002FBBED63D1536240EA2532AAA6C847C0                                                    │
│ size            │ 1154                                                                                                  │
└─────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────┘

select * from soop_xbt_nrt.indexed_file where url like 'IMOS/SOOP/SOOP-XBT/REALTIME/FASB_Astrolabe/2017/IMOS_SOOP-XBT_T_20171227T200300Z_FASB_000450_FV00.csv';                                                                             
┌─[ RECORD 1 ]─────┬───────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ id               │ 211                                                                                                   │
│ job_id           │ 1                                                                                                     │
│ url              │ IMOS/SOOP/SOOP-XBT/REALTIME/FASB_Astrolabe/2017/IMOS_SOOP-XBT_T_20171227T200300Z_FASB_000450_FV00.csv │
│ created          │ ∅                                                                                                     │
│ modified         │ 2019-01-31 04:32:02+00                                                                                │
│ first_indexed    │ 2019-01-31 04:33:56.91022+00                                                                          │
│ last_indexed     │ 2019-05-06 02:22:19.441+00                                                                            │
│ last_indexed_run │ 6                                                                                                     │
│ size             │ 1154                                                                                                  │
│ deleted          │ t                                                                                                     │
└──────────────────┴───────────────────────────────────────────────────────────────────────────────────────────────────────┘

So it doesn't look like it worked

jonescc commented 5 years ago

Hmm. Talend craziness. In the source code the connection to use for the commit is not set (see https://github.com/aodn/harvesters/blob/master/workspace/SOOP_XBT_NRT/process/SOOP_XBT_NRT_subjobs/deleteData_0.1.item#L90). But when you go into the harvester and look at it it shows tPostgresqlConnection_1. I think its just showing the options available but it it hasn't actually been set. I selected it and needed to turn close connection off and on to get the save button to come on so I could save the changes and then it worked.

jonescc commented 5 years ago

Also note that the test scripts we are currently writing for the talend upgrade would have picked up this problem as we try to exercise each harvester path (delete/update/add)

lbesnard commented 5 years ago

cleaned