department-of-veterans-affairs / va.gov-team

Public resources for building on and in support of VA.gov. Visit complete Knowledge Hub:
https://depo-platform-documentation.scrollhelp.site/index.html
284 stars 206 forks source link

Discovery: Explore how to create an EFS Hosted Persistent Volume for Preview Environments #56720

Closed JoeTice closed 1 year ago

JoeTice commented 1 year ago

Description

The objective of this discovery phase is to research and analyze the requirements and technical aspects needed to create an EFS hosted persistent volume for preview environments. This will involve understanding the process of implementing an EFS volume in EKS, identifying the changes required in the Docker image and vets-website configuration, and evaluating the benefits of utilizing a persistent volume, such as reduced startup time and supporting tailored static content in preview environments.

Tasks

Acceptance Criteria

holdenhinkle commented 1 year ago

An EFS (Elastic File System) Hosted Persistent Volume refers to a network-attached storage (NAS) solution provided by Amazon Web Services (AWS) for use with containerized workloads running on their Kubernetes-based platform, Amazon Elastic Kubernetes Service (EKS). A persistent volume is a storage resource that can outlive the life of a container or pod, providing a way to retain and share data between different containers, even if they are running on separate instances.

EFS is a managed file storage service that can automatically scale up and down according to the needs of the applications using it, providing high performance and durability. In the context of Kubernetes, an EFS Hosted Persistent Volume can be utilized as a backing store for Kubernetes Persistent Volumes (PV) and Persistent Volume Claims (PVC), enabling the sharing of data between multiple pods or containers across different nodes in a Kubernetes cluster.

To set up an EFS Hosted Persistent Volume, you would typically follow these steps:

Create an Amazon EFS file system in your AWS account. Configure the necessary IAM roles and security groups to allow the EKS worker nodes to access the EFS file system. Create a Kubernetes StorageClass that uses the EFS CSI (Container Storage Interface) driver, which enables EFS integration with Kubernetes. Create a Kubernetes Persistent Volume (PV) that references the EFS file system and the StorageClass you created in step 3. Create a Kubernetes Persistent Volume Claim (PVC) that binds to the Persistent Volume created in step 4. Finally, deploy your application with a pod specification that includes the PVC as a mounted volume. EFS Hosted Persistent Volumes provide a flexible and scalable storage solution for containerized workloads on AWS, making it an attractive choice for various use cases, such as content management systems, data analytics, and machine learning applications.

https://console.amazonaws-us-gov.com/efs/home?region=us-gov-west-1#/get-started

holdenhinkle commented 1 year ago

Task:

Investigate the process of creating and implementing an EFS hosted persistent volume in EKS in the vagov-staging VPC where preview environments live.

I tried creating an EFS but got the following error:

User: arn:aws-us-gov:iam::008577686731:user/Holden.Hinkle is not authorized to perform: elasticfilesystem:TagResource on the specified resource.

I filed a support request - https://dsva.slack.com/archives/CBU0KDSB1/p1682708370672949


  1. Create an EFS file system in the AWS Management Console: a. Navigate to the EFS section. b. Click "Create file system". c. Choose the "vagov-staging" VPC, and configure the desired security group and mount target settings. d. Set the file system's performance mode and throughput mode as needed.
  2. Install the EFS CSI driver in your EKS cluster:
    kubectl apply -k "github.com/kubernetes-sigs/aws-efs-csi-driver/deploy/kubernetes/overlays/stable/?ref=release-1.0"

URL: https://github.com/kubernetes-sigs/aws-efs-csi-driver/tree/master/deploy/kubernetes/overlays/stable

  1. Create a Kubernetes StorageClass for EFS: Create a YAML file called efs-sc.yaml with the following content:
    apiVersion: storage.k8s.io/v1
    kind: StorageClass
    metadata:
    name: efs-sc
    provisioner: efs.csi.aws.com

Related example: https://github.com/kubernetes-sigs/aws-efs-csi-driver/blob/master/examples/kubernetes/volume_path/specs/example.yaml

Apply the StorageClass:

kubectl apply -f efs-sc.yaml

apply command docs - https://jamesdefabia.github.io/docs/user-guide/kubectl/kubectl_apply/

  1. Create a PersistentVolumeClaim (PVC) for your application: Create a YAML file called efs-pvc.yaml with the following content (replace with the ID of the EFS file system created in step 1):
    apiVersion: v1
    kind: PersistentVolumeClaim
    metadata:
    name: efs-pvc
    spec:
    accessModes:
    - ReadWriteMany
    storageClassName: efs-sc
    resources:
    requests:
      storage: 5Gi
    volumeHandle: <your_efs_file_system_id>::/

Apply the PVC: kubectl apply -f efs-pvc.yaml

holdenhinkle commented 1 year ago

Task:

Research best practices for configuring the Docker image to mount the EFS volume and setting up vets-website to use the mounted volume.

  1. Update the Dockerfile: a. Make sure the Docker image has the necessary dependencies to mount an NFS volume (e.g., install nfs-common package for Debian-based images). b. Create a directory in the Docker image to mount the EFS volume (e.g., /efs).
  2. Modify the Kubernetes deployment YAML file for your application: a. Add a volume entry in the spec.template.spec.volumes section, referencing the previously created PVC:
    volumes:
    - name: efs-volume
    persistentVolumeClaim:
      claimName: efs-pvc

b. Mount the volume to your application container by adding a volumeMounts entry in the spec.template.spec.containers section:

volumeMounts:
  - name: efs-volume
    mountPath: /efs
  1. Update the vets-website configuration to use the mounted EFS volume: a. Modify the configuration files or environment variables in your application to reference the /efs directory as the desired location for file storage.

Here's the updated Dockerfile - https://github.com/department-of-veterans-affairs/vets-website/blob/main/src/platform/utilities/preview-environment/Dockerfile

FROM public.ecr.aws/bitnami/node:14.15.5

# Install NFS utilities for mounting EFS volumes
RUN apt-get update && apt-get install -y nfs-common

RUN mkdir -p /website
WORKDIR /website

# Create a directory for EFS volume mount
RUN mkdir -p /efs

# Clone vagov-content
RUN git clone --depth 1 https://github.com/department-of-veterans-affairs/vagov-content

# Clone vets-gov-json schema
RUN git clone --depth 1 https://github.com/department-of-veterans-affairs/vets-json-schema.git

# Clone veteran-facing-services-tools
RUN git clone --depth 1 https://github.com/department-of-veterans-affairs/veteran-facing-services-tools

# Clone content-build
RUN git clone --depth 1 https://github.com/department-of-veterans-affairs/content-build.git

# Setup a working directory for vets-website
RUN mkdir -p /website/vets-website
WORKDIR /website/vets-website

# Copy vets-website files into Docker image
COPY . .

# Copy startup script into place
COPY src/platform/utilities/preview-environment/start.sh .
RUN chmod +x start.sh

# Expose ports
EXPOSE 3001
EXPOSE 3002

ARG AWS_URL
ENV AWS_URL $AWS_URL

# Configure image to execute a script on startup with ENTRYPOINT/CMD
ENTRYPOINT ["./start.sh"]
CMD ["$AWS_URL"]
holdenhinkle commented 1 year ago

Task:

Identify the necessary steps to synchronize the static content build external to the preview environment creation process.

If I'm understanding this question correctly, we can just create a GitHub Action that would run every time the main branch of the content build repo is updated, run the build process, and then store the build in an EFS instance using whatever file directory or filename naming convention we decide on.

holdenhinkle commented 1 year ago

Task:

Explore the requirements and potential methods for implementing a startup script to mount a specific subdirectory containing the desired version of the content build.

To mount a specific subdirectory from an EFS volume containing different versions of the content-build, you can modify the Kubernetes deployment YAML file and use the subPath field in the volumeMounts section. Here's a step-by-step process:

  1. Ensure that your EFS volume contains the different versions of the content-build in separate subdirectories. For example, /efs/v1, /efs/v2, etc.

  2. Update the Kubernetes deployment YAML file for your application: a. Add a volume entry in the spec.template.spec.volumes section, referencing the previously created PVC:

    volumes:
    - name: efs-volume
    persistentVolumeClaim:
      claimName: efs-pvc

b. Mount the desired subdirectory from the volume to your application container by adding a volumeMounts entry in the spec.template.spec.containers section. Use the subPath field to specify the subdirectory that contains the desired version of the content-build:

volumeMounts:
  - name: efs-volume
    mountPath: /content-build
    subPath: v1

Replace v1 in subPath: v1 with the desired version's subdirectory name.

  1. Modify your application code, configuration files, or environment variables to reference the /content-build directory instead of the previous content-build directory. This way, your application will use the desired version of the content-build from the mounted EFS subdirectory.

  2. If you need to switch between different versions of the content-build at runtime, you can use environment variables or ConfigMaps in your Kubernetes deployment to dynamically set the subPath value. This would involve using a template engine or pre-processing the deployment YAML file before applying it.

Please note that this approach assumes you have a separate Kubernetes deployment for each desired version of the content-build. Alternatively, you could use a single deployment with multiple replicas, where each replica mounts a different version of the content-build. However, this might require additional logic in your application to manage and route traffic between the different versions.

By using the subPath field in your Kubernetes deployment, you can mount specific subdirectories from your EFS volume and effectively switch between different versions of the content-build.


Elaborating on Step 4:

In step 4, I mentioned using environment variables or ConfigMaps to dynamically set the subPath value in your Kubernetes deployment. This would allow you to easily switch between different versions of the content-build at runtime. I'll provide two examples: one using environment variables and the other using ConfigMaps.

Using environment variables

  1. In your Kubernetes deployment YAML file, define an environment variable for your application container that holds the desired content-build version:

    env:
    - name: CONTENT_BUILD_VERSION
    value: "v1"
  2. Use a template engine like Kustomize or Helm to process your deployment YAML file before applying it. These tools allow you to replace variables in the YAML file with actual values during deployment.

  3. In your deployment YAML file, replace the subPath field value with a placeholder that represents the environment variable:

    subPath: ${CONTENT_BUILD_VERSION}
  4. When deploying your application, use the template engine to replace the placeholder with the actual value of the environment variable. This way, you can easily switch between content-build versions by changing the environment variable value and re-deploying your application.

Using ConfigMaps

  1. Create a ConfigMap that contains the desired content-build version as a key-value pair:

    apiVersion: v1
    kind: ConfigMap
    metadata:
    name: content-build-version
    data:
    version: "v1"
  2. In your Kubernetes deployment YAML file, reference the ConfigMap value as an environment variable for your application container:

    env:
    - name: CONTENT_BUILD_VERSION
    valueFrom:
      configMapKeyRef:
        name: content-build-version
        key: version
  3. Similar to the environment variable approach, use a template engine to process your deployment YAML file and replace the subPath field value with a placeholder that represents the environment variable:

    subPath: ${CONTENT_BUILD_VERSION}
  4. When deploying your application, use the template engine to replace the placeholder with the actual value from the ConfigMap. To switch between content-build versions, simply update the ConfigMap value and re-deploy your application.

Both of these methods allow you to dynamically set the subPath value in your Kubernetes deployment to mount specific subdirectories from your EFS volume containing different versions of the content-build. By changing the environment variable or ConfigMap value, you can switch between versions without modifying the deployment YAML file directly.

holdenhinkle commented 1 year ago

Task:

Explore the requirements and potential methods for removing older versions of content-build when they become too stale

We can create a GitHub Actions that would would run once a day and delete directories/files that are older than, say, 30 days.

**

A couple of other ways to do this:

Amazon EFS does not provide built-in functionality to automatically delete files or directories based on their age. To achieve this, you'll need to implement a custom solution, such as running a cron job on an EC2 instance or using a scheduled AWS Lambda function to periodically clean up old files and directories.

Using a cron job on an EC2 instance

  1. Launch an EC2 instance in the same VPC as your EFS file system.
  2. Mount the EFS file system on the EC2 instance using the NFS protocol.
  3. Create a script (e.g., cleanup.sh) to delete files and directories older than 30 days:
    #!/bin/bash
    find /path/to/efs-mount-point -type f -mtime +30 -exec rm -f {} \;
    find /path/to/efs-mount-point -type d -empty -delete

Replace /path/to/efs-mount-point with the actual mount point of your EFS volume on the EC2 instance.

  1. Make the script executable: chmod +x cleanup.sh

  2. Schedule the script to run periodically (e.g., daily) using cron:

    crontab -e

Add the following line to the cron table:

0 0 * * * /path/to/cleanup.sh

This will execute the script every day at midnight.

Using a scheduled AWS Lambda function

  1. Create a Lambda function using a runtime like Python or Node.js.

  2. Use the AWS SDK to interact with your EFS file system. You'll need to configure your Lambda function to access the EFS file system by creating an EFS access point and mounting it to the Lambda function.

  3. Implement a script in your Lambda function that scans the EFS file system, identifies files and directories older than 30 days, and deletes them.

  4. Create an Amazon EventBridge (formerly CloudWatch Events) rule to trigger your Lambda function periodically (e.g., daily).

Either of these methods can help you automatically delete files and directories older than 30 days in your EFS file system. The choice depends on your preference, operational requirements, and the environment in which your EFS is being used.

holdenhinkle commented 1 year ago

MISC: How to store JSON blobs (or any kind of file) in EFS

You can create your own directories and store JSON blobs or other files within those directories when using Amazon EFS. EFS behaves like a network file system (NFS) that can be mounted to multiple instances or containers simultaneously, allowing you to read and write files as you would on any other file system.

To create directories and store JSON blobs in each directory, follow these steps:

  1. Mount the EFS volume to your instance or container, as described in the previous answers.

  2. Once the EFS volume is mounted, you can use standard file system commands and programming libraries to create directories, read, write, and delete files. For example, if you mounted the EFS volume at /efs, you could create a directory called my_directory and store a JSON blob as follows:

    mkdir /efs/my_directory
    echo '{ "key": "value" }' > /efs/my_directory/my_blob.json
  3. You can also interact with the EFS volume programmatically using your preferred programming language's file I/O libraries, such as Node.js's fs module.

The EFS volume is shared across all instances or containers that have it mounted, so changes made to the files and directories will be visible to all of them. This can be beneficial for sharing configurations or data across your application, but you should also be aware of potential concurrency issues when multiple processes are accessing and modifying the same files simultaneously.

pjhill commented 1 year ago

Thanks for looking into all these aspects. I haven't digested all the analysis yet, but I think this goes a long way to helping us understand the capabilities/limitations and in being able to describe the implementation details for each aspect of what we need. Thanks!