Closed cornhundred closed 6 years ago
Hi @cornhundred,
In the YAML, you can see it here in Isaac's job definition
IsaacJobDefinition:
Type: AWS::Batch::JobDefinition
Properties:
JobDefinitionName: !Join ["-", ["isaac", !Ref Env]]
Type: container
RetryStrategy:
Attempts: !Ref RetryNumber
ContainerProperties:
Image: !Ref IsaacDockerImage
Vcpus: !Ref IsaacVcpus
Memory: !Ref IsaacMemory
JobRoleArn: !Ref JobRoleArn
MountPoints:
- ContainerPath: "/scratch"
ReadOnly: false
SourceVolume: docker_scratch
Volumes:
- Name: docker_scratch
Host:
SourcePath: "/docker_scratch"
Effectively, you define the source path on your instance under Volumes
and then you define the container mount point under MountPoints
.
mount_volume.sh
is the code we used to help create the Golden AMI. You can find this in part 3 of the blog series.
We do need to make this more clear though in the GH repo...will see what I can do.
Hi @ajfriedman18,
Thank you for the help. I made an AMI using the instructions on the blog and I think it mounted the 1TB EBS volume correctly. I can ssh into a running EC2 instance of the AMI and see the 1TB docker_scratch
volume with df -h
[ec2-user@ip-######### docker_scratch]$ df -h
Filesystem Size Used Avail Use% Mounted on
/dev/xvda1 7.8G 703M 7.0G 9% /
devtmpfs 489M 84K 489M 1% /dev
tmpfs 497M 0 497M 0% /dev/shm
/dev/xvdb 985G 72M 935G 1% /docker_scratch
From what I understand I now need to use this custom AMI to run the batch jobs.
Do the MountPoints
and Volumes
container properties have a similar function to running a docker container and passing in an external volume? Also, will all submitted jobs will share this common docker_scratch directory (e.g. multitenancy)?
Also, do we need to specify the memory available for the AMI? I see that we can specify the memory available for a job definition, but does the same need to be done when making the AMI (e.g. when selecting the t2.micro instance type to launch)? Or does the 'managed compute environment' take care of this?
@cornhundred, yes all jobs will share an external volume in the scenario we built. However, the individual Docker containers have a python wrapper that creates a unique subdirectory in the volume so you won't have any file clashes.
You shouldn't need to specify the memory available for the AMI or instance. The closest you get to this is in defining your compute environment instance types. Beyond that, Batch handles the rest. It'll look at your Job Definition and 1) see if any instances already in the CE have resources available to run the job, and if not 2) spin up a new instance that can meet the resource requirements you've specified.
Hi @ajfriedman18
Thanks for the clarification. We were able to get jobs running on our end that uses your shared scratch directory set-up and batch managed the computer environments took care of selecting instances with sufficient memory.
Best, Nick
Hello, @cornhundred
The blog show only the way how to add ebs via web console. As I understand posts before - you have to lanuch EC2 first and then configure it with SSH. No way to so automatically via cloudformation template?
Hi @nikita-sheremet-java-developer. As of now, AWS Batch does not allow for attaching EBS volumes at instance launch, which is why creating the Custom AMI is a required step. Though if you'd prefer to script it, you could likely create write a simple python or shell script with the CLI to create the custom AMI.
A feasible solution is to use EFS as the shared storage for genomic pipeline especially for any reference data like genome sequence, databases index such as blast index and bowtie index and build you own AMI on top of EC2 with EFS mount. Of course you need to create EFS first and mount it to the EC2 which is used for your customer AMI. Then pass the AMI to AWS Batch.
Here is a cloudformation that will add data to the EC2 launch, https://github.com/vfrank66/awsbatchlaunchtemplate
It looks like you are using mounted volumes for storing large files (e.g. reference genomes) on the batch job containers. I see that you are using
docker_scratch
volumes in the cloudformation YAML, but it is unclear how that volume is being set up from the YAML. Also, where ismount_volumes.sh
being run?