Closed mrocklin closed 4 years ago
Based on https://forums.aws.amazon.com/thread.jspa?messageID=927200 and an experiment I ran with necaris/sizetest
which is 4.17GB in size (rather than mrocklin/pytorch-optuna
's 5.17GB), my guess is that we're hitting the 10GB Docker layer limit for Fargate 1.3.0 tasks.
Our fix for this (using Fargate 1.4.0, which allows 20GB of space) is already on sandbox and will be on beta shortly so we can retest. I think this does increase the priority of our wanting to run schedulers on EC2, though.
@mrocklin FWIW I've just tested this on beta after deploying and :crossed_fingers: it seems to be launching fine.
https://us-east-2.console.aws.amazon.com/ecs/home?region=us-east-2#/clusters/_dask_dev/tasks/5147575aff4e42cfaef46c2203311c68/details
cc @necaris
I can trigger this with