coralnet / pyspacer

Python based tools for spatial image analysis
MIT License
6 stars 2 forks source link

AWS Batch bindings #29

Closed beijbom closed 3 years ago

beijbom commented 3 years ago

This PR adds binding for running jobs through AWS batch.

There are actually no changes at all to the core spacer code. The only conceptual change is that jobs are no longer submitted by a serialized JobMsg. Rather they are submitted with a serialized DataLocation which in turn points to a JobMsg. (Confer run_job_verbose in spacer/mailman.py).

This change was done since serialized JobMsg got too large for AWS Batch when there are 1000 points for an image. This is an added level of indirection, but I think it's ok -- it forces us to write the JobMsg itself to s3, creating a permanent record. This also allows me to change TrainClassifierMsg to contain the TrainData instead of having a traindata_loc. I'll make that change in the next PR for clarity.

The changes in this PR are exclusively:

Next PR:

If you have account credential, you can check logs directly through CloudWatch: https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logsV2:log-groups/log-group/$252Faws$252Fbatch$252Fjob/log-events/spacer-job$252Fdefault$252Fef3ce414-aa33-43a2-ac60-823bf3dc56df

Also try the Batch console: https://us-west-2.console.aws.amazon.com/batch/home?region=us-west-2#/jobs. I'm using the shakeout queue for all tests.