department-of-veterans-affairs / va.gov-cms

Editor-centered management for Veteran-centered content.
https://prod.cms.va.gov
GNU General Public License v2.0
99 stars 69 forks source link

Set up DataSync between Staging Test EFS and Staging Test Public S3 #16925

Closed timcosgrove closed 9 months ago

timcosgrove commented 9 months ago

Requirements

We need files to sync from Staging Test's EFS to the Public S3 bucket for Staging test, so that files added via the Drupal file system are available publicly.

### Acceptance criteria
- [x] A Datasync process is set up between Staging Test EFS and Staging Test Public S3
- [x] Once all files are sync'd, file sync tests are run to learn how long sync of new files should take

Background & implementation details

EFS: https://us-gov-west-1.console.amazonaws-us-gov.com/efs/home?region=us-gov-west-1#/file-systems/fs-3a9aa63b S3: https://us-gov-west-1.console.amazonaws-us-gov.com/s3/buckets/dsva-vagov-staging-cms-test-files?region=us-gov-west-1&tab=objects

EFS path: all files under docroot/sites/default/files/ S3 destination path: img/

Once the sync is set up and the initial sync is complete, we should add files to the Staging Test CMS and track how long they take to end up on S3. Drupal folks on the team can help with identifying where the files will end up on Drupal so we can observe on the S3 end.

jschmidt-civicactions commented 9 months ago

Should be useful: https://repost.aws/knowledge-center/datasync-transfer-efs-s3

olivereri commented 9 months ago

For posterity, should I no longer be here when and if datasync is rolled out to other systems and environments; here is the process to get datasync configured:

For each terraform environment dsva-vagov-(dev,staging,prod): add to the cms.tf and cms-test.tf module files within the curly braces:

  subnets_arn = [
    aws_subnet.subnet_1a.arn,
    aws_subnet.subnet_1b.arn,
    aws_subnet.subnet_1c.arn,
  ]

Then, bump source version to v1.13.10 i.e.: github.com/department-of-veterans-affairs/terraform-aws-vsp-cms?ref=v1.13.10

olivereri commented 9 months ago

Documentation: https://github.com/department-of-veterans-affairs/va.gov-cms/blob/main/READMES/datasync.md

olivereri commented 9 months ago

The initial sync from EFS to S3 was blazingly fast. It took 4 minutes to transfer roughly 64GB of data. At a rate of 864.35 Files per second that would math out to 1 file per millisecond. After tweaking the file verification to do it on the fly it takes less than 2 minutes to prepare and launch the sync task. The expectation would be that Datasync task runs shouldn't take more than 3 minutes.

image