This PR adds binding for running jobs through AWS batch.
There are actually no changes at all to the core spacer code. The only conceptual change is that jobs are no longer submitted by a serialized JobMsg. Rather they are submitted with a serialized DataLocation which in turn points to a JobMsg. (Confer run_job_verbose in spacer/mailman.py).
This change was done since serialized JobMsg got too large for AWS Batch when there are 1000 points for an image. This is an added level of indirection, but I think it's ok -- it forces us to write the JobMsg itself to s3, creating a permanent record. This also allows me to change TrainClassifierMsg to contain the TrainData instead of having a traindata_loc. I'll make that change in the next PR for clarity.
The changes in this PR are exclusively:
Updated test scripts in scripts/aws.
Added logging messages throughout.
Added a env_job mailman to allow job submission through env variables.
Next PR:
Update to use boto3 throughout (boto2 doesn't support Batch)
This PR adds binding for running jobs through AWS batch.
There are actually no changes at all to the core spacer code. The only conceptual change is that jobs are no longer submitted by a serialized
JobMsg
. Rather they are submitted with a serializedDataLocation
which in turn points to aJobMsg
. (Conferrun_job_verbose
inspacer/mailman.py
).This change was done since serialized
JobMsg
got too large for AWS Batch when there are 1000 points for an image. This is an added level of indirection, but I think it's ok -- it forces us to write theJobMsg
itself to s3, creating a permanent record. This also allows me to changeTrainClassifierMsg
to contain theTrainData
instead of having atraindata_loc
. I'll make that change in the next PR for clarity.The changes in this PR are exclusively:
scripts/aws
.env_job
mailman to allow job submission through env variables.Next PR:
TrainClassifierMsg
as discussed above.If you have account credential, you can check logs directly through CloudWatch: https://us-west-2.console.aws.amazon.com/cloudwatch/home?region=us-west-2#logsV2:log-groups/log-group/$252Faws$252Fbatch$252Fjob/log-events/spacer-job$252Fdefault$252Fef3ce414-aa33-43a2-ac60-823bf3dc56df
Also try the Batch console: https://us-west-2.console.aws.amazon.com/batch/home?region=us-west-2#/jobs. I'm using the
shakeout
queue for all tests.