aodn / content

Tracks AODN Portal content and configuration issues
0 stars 0 forks source link

pipeline regexp issue #208

Closed lbesnard closed 8 years ago

lbesnard commented 8 years ago

Some ANMN NRS files had to be modified a month ago for the Beagle site. However it seems like the new files should have landed in the error directory. Instead they were pushed to s3 successfully without being harvested.

See https://github.com/aodn/chef-private/blob/master/data_bags/talend/anmn_nrs_dar_yon.json, the regexp is defined as

  "regex": [
            "^IMOS/ANMN/.*/IMOS_ANMN_.*_NRSDAR_FV0.*",
            "^IMOS/ANMN/.*/IMOS_ANMN_.*_NRSYON_FV0.*",
            "^IMOS/ANMN/.*/IMOS_ANMN_.*_NRSBEA_FV0.*"

The new Beagle files had their site code changed from NRSBEA TO DARBGF. The chef private file above should have been modified to handle this new site code but wasn't. Adding a Beagle file to the $INCOMING_DIR should have failed, but it didn't.

The file still got pushed to s3, BUT without data in the database.

How to reproduce :

wget https://s3-ap-southeast-2.amazonaws.com/imos-data/IMOS/ANMN/NRS/REAL_TIME/DARBGF/actual_depth@1.0m_channel_84899/2016/NO_QAQC/IMOS_ANMN_Z_20160201T000000Z_DARBGF_FV00.nc $INCOMING_DIR/ANMN/AIMS_NRS

input_logf ANMN_NRS_AIMS
# this will show that the file is processed sucessfully to s3

on dbprod do

select * from anmn_nrs_dar_yon.indexed_file where url ~ 'DARBGF'
> Null

@anguss00 @pblain @smancini

mhidas commented 8 years ago

@lbesnard Those files have been successfully harvested by the anmn_realtime harvest job:

select count(*) from anmn_metadata.indexed_file where url ~ 'DARBGF';
 count 
-------
   717
lbesnard commented 8 years ago

after talking with @mhidas , the metadata harvester did harvest these files. The anmn_nrs_dar_yon harvester didn't harvester those files because of non matching regexp. But the files got pushed to s3 because at least one harvester harvest the files.

This is a tricky scenario.

All Beagle files need to be re downloaded and pushed back to $INCOMING_DIR

smancini commented 8 years ago

All Beagle files have been uploaded again to Incoming. The issue about multiple harvesters triggered bya single incoming handler is documented here https://github.com/aodn/backlog/issues/371