aws-solutions-library-samples / osml-model-runner

MIT No Attribution
7 stars 1 forks source link

feat: implement region status tracking #89

Closed drduhe closed 1 month ago

drduhe commented 2 months ago

Issue #, if available: n/a

Notes

This PR introduces the RegionStatusMonitor, responsible for handling and publishing region request statuses to an SNS topic. This follows the pattern previously established with the ImageStatusMonitor, ensuring consistency in how status updates are handled within the Model Runner. The change addresses the current issue where the Model Runner is not writing status updates to the region status topic as expected. We are migrating existing messaging to a new consolidated class StatusMessage which updates the output property from image_status to status but are keeping the old image_status property available until model-runner v3.0, at which point it will be depreciated.

This update now allows ModelRunner to re-drive partially completed image/region requests. If any tiles in a region fail to process and those regions are re driven from the DLQ - only the tiles that failed to process will be run again. Likewise - if a user manually submits an image request with an identical job ID for a job where some tiles failed - only those tiles that failed will be reprocessed.

Key Changes:

Bug Fix:

Testing:

This PR ensures that region request status updates are handled and communicated consistently, resolving the existing issue and aligning with the existing pattern for image status monitoring.

Checklist

Before you submit a pull request, please make sure you have the following:

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of your choice.

drduhe commented 1 month ago

Please do not merge. I need time to dig into this and review a lot of assumptions made in this PR.

I believe this is OOA now =)