GeoscienceAustralia / dea-orchestration

4 stars 1 forks source link

Improve nci jobs workflow #117

Closed santoshamohan closed 4 years ago

santoshamohan commented 5 years ago

Reason for this pull request

Existing nci workflow was dependent on PBS normal queue jobs to complete execution within a day. When PBS jobs queue for longer duration (sometimes couple of days), NCI jobs would simply fail or does nothing. Also, ingest, fractional cover, and wofs jobs are scheduled differently using qsub. This would make it difficult to use pbs job depends after feature.

A better workflow was required to manage job dependencies such that next job is submitted after the existing pbs job has successfully completed. To achieve this aws step functions, aws dynamodb, aws lambda function, and aws serverless features were used to manage job dependencies.

Proposed solution

1) Orchestrate with AWS step function, lambdas, and update the job status in dynamodb database. 2) Develop four separate lambda function handlers to manage job submission, fetch job id's, check job status, and update job failed status. 3) Use serverless step function plugin to invoke aws step functions. 4) Following new raijin scripts are added:

santoshamohan commented 4 years ago

Latest Gadi updates to orchestration project supersedes this pull request. Closing this pull request