Netflix / metaflow-tools

:rocket: Deployment tools/scripts for Metaflow!
http://www.metaflow.org
Apache License 2.0
52 stars 47 forks source link

Lack of disk space on default AWS Batch ComputeEnvironment #25

Open mattmcclean opened 3 years ago

mattmcclean commented 3 years ago

Running training jobs with AWS Batch often will result in a lack of disk space as the default EBS volume size is quite low (8 GB). Would it be possible to add an input parameter to increase the root EBS volume of the EC2 instances ? I have tested it out and it requires adding an EC2 Launch Template to the ComputeEnvironment setup. Happy to create a PR if required.

GregHilstonHop commented 3 years ago

Hey @mattmcclean ,

I think this is a really good idea! We ended up experiencing this issue pretty early on as well and solved this in our infrastructure by creating a new launch template that dictates a 100 GB disk. While our solution is in Terraform, you might find it useful as a guide if you wanted to do this yourself in Cloud Formation. link to the specific code