azavea / pfb-network-connectivity

PFB Bicycle Network Connectivity
Other
40 stars 10 forks source link

Manually create EC2-based managed compute environment on staging #857

Closed KlaasH closed 2 years ago

KlaasH commented 2 years ago

The big comment on issue #839 describes the resource requirements of the analysis and says the next step is to determine whether we can allocate enough storage space to run the tasks on Fargate. The documentation about Fargate task storage talks only about "ECS tasks", but I hoped that might be just the focus of the page rather than an exclusive list. But unfortunately this issue on the "Containers roadmap" repo makes it clear that the ephemeralStorage parameter is not available for Batch tasks.

So we need to stick with an EC2-based compute environment, but we should still be able to transition to a managed one. The next step is to manually create a managed Batch compute environment using an EC2 instance class that meets the requirements (4 CPUs, 12 GiB of RAM, and 150 GB of storage) without overshooting too much, then configure the staging site to use it and run a few test jobs.

KlaasH commented 2 years ago

Manually-provisioned managed compute environment

I created a managed compute environment, changed the staging tfvars to use it, and successfully launched analysis jobs from the staging admin site that ran and showed up on the staging public site.

Compute environment options

Here are the parameters I used when creating the compute environment (ones not listed were left blank or with the default values provided in the AWS console form):

Type: Managed
Service role: AWSBatchServiceRole
Instance role: StagingBatchInstanceProfile
EC2 keypair: azavea-pfb

Instance configuration: On-demand
Allowed instance types: m5d family
Allocation strategy: BEST_FIT
Launch template: BatchStorageInit
Launch template version: $Latest

VPC id: pfbStaging | vpc-0e225168
Subnets:
  subnet-96db84cd | 10.0.1.0/24 | us-east-1a | PrivateSubnet
  subnet-b1a1819c | 10.0.3.0/24 | us-east-1c | PrivateSubnet
Security group: sg-2ffc0150 (sgBatchContainerInstance, terraform-00ce2250770ffe3c3459ec2294)

EC2 tags:
Name: BatchInstance
Environment: Staging

Many of these values (Service role, Instance role, VPC id, and Security group) are the existing ones we've been using for the unmanaged Batch environment. I'm not sure the Service role is actually different from the one that would be provided by default, but the Instance role is--it has S3 access. The security group is important because it allows the analysis to make HTTP and HTTPS requests (to download Census and OSM data) and talk to the database to update job statuses, plus it allows inbound SSH for debugging.

Launch template

To get the ephemeral NVMe storage attached to the m5d instance type mounted as a usable drive, I created a launch template. That has to be done before you create the compute environment that uses it, and it turns out specifying $Latest as the version of the template that the compute environment uses doesn't mean it pulls the latest every time it runs, it just pulls the latest version when the compute environment is created. So if you change the launch template you have to recreate the compute environment to get it to load the changes.

Most of the parameters that could be set by the launch template are handled by the compute environment or the job definition, and there's no reason to specify them in the template (it would make them somewhat harder to find, and in any case values specified in those other ways generally override values in the template). The one thing the launch template contains is a "User data" field with some cloud-init code:

MIME-Version: 1.0
Content-Type: multipart/mixed; boundary="==MYBOUNDARY=="

--==MYBOUNDARY==
Content-Type: text/cloud-config; charset="us-ascii"

fs_setup:
  - label: nvme1n1
    filesystem: ext4
    extra_opts: ["-E", "nodiscard"]
    device: /dev/nvme1n1
    partition: auto

mounts:
  - [ /dev/nvme1n1, "/mnt/ephemeral", "ext4", "defaults,discard", "0", "2" ]

--==MYBOUNDARY==--

Instance type

As I wrote on issue #839, I think it makes sense to allocate 4 CPUs, at least 12 GiB of RAM, and at least 150 GB of storage to the analysis. That will be more storage than we need in most cases, but since we're not changing the job parameters by neighborhood size, we need to provision enough for the biggest jobs to run. And it's not that wasteful to overprovision for smaller jobs, since the 4 CPUs and ample memory should allow them to run quickly.

I chose "m5d family" in the compute environment setup rather than explicitly picking an instance size. I chose vCPU and memory requirements (*see note below) in the job definition to match the m5d.xlarge class, but if we wanted to transition in the future to adapting the size of the task to the size of the neighborhood, specifying the instance family would facilitate that. And I don't think there's any harm in it in the meantime.

*note re memory requirements: The official memory allocations of various instance types are given as multiples of GiB, but some memory is reserved for the operating system and task management agent, so if you submit a job with e.g. 16000 MB as its memory requirement, it won't fit on an m5d.xlarge instance. Following the instructions under "Viewing Compute Resource Memory" on this documentation page, I settled on 15500 MB as a number that would utilize all the memory in an m5d.xlarge instance but not over-allocate and cause Batch to spin up a bigger instance.

tfvars

A lot of the batch setup is not currently managed in Terraform. I wasn't trying to change that at this stage, or adjust the provisioning to create parts of the managed compute environment, but I wanted to avoid either destroying/breaking the current resources or having the provisioning scripts clobber any of the manual changes. It turned out there wasn't much that needed to be changed to make the Terraform provisioning of the app work with the manually-provisioned compute environment--I just changed batch_analysis_job_queue_name from "staging-pfb-analysis-job-queue" to "staging-managed-test-queue". I also removed the batch_analysis_compute_environment_arn variable because as far as I can tell it hasn't been in use anywhere.

Next steps

So now that we know we can use a managed compute environment and have an example of what it should look like, the next thing to do is get Terraform to build it for us. This might be moderately complex, since 1) a lot of the batch-related provisioning wasn't in Terraform at all before, so we need to identify the other tools we're using and figure out whether we should keep using a modified version of them or if Terraform can do the job now and we should migrate entirely to that. 2) I believe a big part of the reason we used manual tooling to begin with was that Terraform didn't have that much support for Batch at the time. I imagine that has changed in the years since, but we might well need to upgrade to a newer version of Terraform to benefit from it. We've been overdue to upgrade Terraform anyway, but it will definitely add a chunk of work to the task if it is indeed necessary.

On the other hand, given that we currently have a working Terraform setup that's handling most of our provisioning, fitting the pieces of this into it shouldn't be that troublesome.