Analysis of datasets processed during orchestration would enable us to use the NCI resources efficiently. So, need for monitoring dataset processing information such as #datasets found, #datasets indexed, #datasets failed, service units used, etc. from the automated datacubeingest for the current LandsatNBAR/NBART/PQ/WOfS/Fractional Cover products would be a value add to dea-orchestration process. Dataset information are read from the pbs logs and then pushed to AWS Elasticsearch for further analysis.
Proposed solution
Upon new S3 object creation, an event notification is sent to AWS SQS service to delay lambda function execution. The delayed lambda execution enables PBS logs being available in the NCI directory for further processing. The SQS message delivery delay in this configuration is set to 10 minutes.
read_nci_email.py handler is updated to fetch dataset information from the logs by using appropriate regular expression search and push the updated metadata document into Amazon ES.
New raijin_scripts/execute_fetch_dataset_info/run is created to fetch datasets processing info from the pbs logs on the NCIdirectory.
serverless.yml is updated to provide AWSSQSIMA role and allow AWSSQS to trigger AWSLambda function.
Minor documentation updates are done to lambda_functions/nci_monitoring/package.json and lambda_functions/nci_monitoring/raijin_ssh.py
Reason for this pull request
Analysis of datasets processed during orchestration would enable us to use the NCI resources efficiently. So, need for monitoring dataset processing information such as
#datasets found
,#datasets indexed
,#datasets failed
,service units used
, etc. from the automateddatacube
ingest
for the currentLandsat
NBAR
/NBART
/PQ
/WOfS
/Fractional Cover
products would be a value add todea-orchestration
process.Dataset
information are read from the pbs logs and then pushed toAWS Elasticsearch
for further analysis.Proposed solution
S3
object creation, an event notification is sent toAWS SQS
service to delaylambda
function execution. The delayedlambda
execution enablesPBS logs
being available in theNCI
directory for further processing. TheSQS
message delivery delay in this configuration is set to 10 minutes.read_nci_email.py
handler is updated to fetch dataset information from the logs by using appropriate regular expression search and push the updated metadata document intoAmazon ES
.raijin_scripts/execute_fetch_dataset_info/run
is created to fetch datasets processing info from thepbs
logs on theNCI
directory
.serverless.yml
is updated to provideAWS
SQS
IMA role
and allowAWS
SQS
to triggerAWS
Lambda
function.lambda_functions/nci_monitoring/package.json
andlambda_functions/nci_monitoring/raijin_ssh.py