CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

Shared file system between ingest and storage - what is the correct AWS approach? #1140

Closed terrywbrady closed 2 years ago

terrywbrady commented 2 years ago

The amount of content deposited into Merritt varies significantly from week to week. Deposits from each campus are sporadic and project-driven. One week in May 2022, we had 19TB of content deposited.

When a campus does initiate a large ingest, our current provisioned I/O is overwhelmed by the content.

What access this shared content

Past implementations

Options

What would AWS recommend?

Martin will add this to the agenda for an IAS conversation with Kevin


Workflow

File System Needs

Options

terrywbrady commented 2 years ago

https://pilotcoresystems.com/insights/ebs-efs-fsx-s3-how-these-storage-options-differ/ Amazon FSx AWS EFS has you covered for all file-system storage requirements. Or does it? EFS works with EC2 instances as a managed NAS filer. FSx, on the other hand, offers a managed Windows Server environment that runs Windows Server Message Block services.

pilotcoresystems.compilotcoresystems.com EBS vs EFS vs FSx vs S3: How These Storage Options Differ | Pilotcore How do you choose between EBS vs EFS vs FSx vs S3? We help you pick the Amazon storage option that is right for your use case. (2 MB) https://pilotcoresystems.com/insights/ebs-efs-fsx-s3-how-these-storage-options-differ/

:thankyou: 1

9:45 This page sounds closer to what we were discussing, so my prior link may not be so helpful: https://aws.amazon.com/fsx/lustre/faqs/

Ashley Gould 9:51 AM from the same faq page: Amazon FSx also integrates with Amazon S3, making it easy for you to process cloud data sets with the Lustre high-performance file system. When linked to an S3 bucket, an FSx for Lustre file system transparently presents S3 objects as files and automatically updates the contents of the linked S3 bucket as files are added to, changed in, or deleted from the file system. Amazon Web Services, Inc.Amazon Web Services, Inc. Cloud Object Storage – Amazon S3 – Amazon Web Services Amazon S3 is cloud object storage with industry-leading scalability, data availability, security, and performance. S3 is ideal for data lakes, mobile applications, backup and restore, archival, IoT devices, ML, AI, and analytics.

sfisher 10:20 AM FSx sounds neat for working with data processing and representing S3 files as local ones would be nice for operating on them. I'm guessing it's fairly pricey?

Colin Thompson 10:48 AM roughly double the price of EFS, if I'm reading this correctly: https://aws.amazon.com/fsx/lustre/pricing/ https://aws.amazon.com/efs/pricing/

terrywbrady commented 2 years ago

7/27 - Meeting with IAS and AWS Reps

Refactoring consideration

terrywbrady commented 2 years ago

What about allocating specific file systems for priority collections?

ashleygould commented 2 years ago

Estimates for ZFS changes

SSD storage capacity:   $0.090 per GB-month
Throughput capacity:    $0.260 per MBps-month
SSD IOPS:               $0.0060 per IOPS-month

Assume you want to store 5 TB of general-purpose file data using SSD storage in the US West Region. Provision a 5 TB Single-AZ file system with 256 MB/s of throughput capacity.

Storage:        5 TB x $0.09 per GB-month       = $461/mo
IOPs:           15360 (3 per GB storage)        =   $0/mo
Throughput:     256 MB/s x $0.26 per MB/s-month =  $67/mo

Total monthly charge:                             $528/mo

10TB SSD storage

Storage:        10 TB x $0.09 per GB-month      = $922/mo
IOPs:           30720 (3 per GB storage)        =   $0/mo
Throughput:     256 MB/s x $0.26 per MB/s-month =  $67/mo

Total monthly charge:                             $989/mo
ashleygould commented 2 years ago

Testing zfs on stage

I am working with IAS to create a 2 TB ZFS volume. On each of the uc3-mrt-ingest-stg and uc3-mrt-store-stg hosts. We will mount this volume on /apps/ingest-stg-zfs. Once ready, Mark will reconfigure uc3-mrt-ingest-stg to write data to this new mount point and restart ingest. Merritt team will then load some large ingests into stage to see how the new volume performs.

Martin and I are done adding the ZFS volume to all 4 stage ingent and store hosts. Please configure ingest to write to the new mount point /apps/ingest-stg-zfs.

agould@uc3-ingest01x2-stg:~> df -h /apps/ingest-stg-zfs
Filesystem                                             Size  Used Avail Use% Mounted on
fs-0b9822a1af853be7a.fsx.us-west-2.amazonaws.com:/fsx  2.0T     0  2.0T   0% /apps/ingest-stg-zfs

Removing Stage ingest worker 2 from new ALB to configure new ZFS disk.

It is a bit more involved than a single symlink to change mount point. Should be no problem though

SSM
        /uc3/mrt/stg/ingest/config/ingestQueuePath      /apps/ingest-stg-shared/ingest_home/queue
Tomcat
        webapps/ingestqueue -> /apps/ingest-stg-shared/ingest_home/queue
ingest_home
        /dpr2/ingest_home/queue -> /apps/ingest-stg-shared/ingest_home/queue
mreyescdl commented 2 years ago

Stage Ingest and Storage are now using shared ZFS disk.

elopatin-uc3 commented 2 years ago

@terrywbrady we talked about status of ZFS testing on stage in today's (8/15) team meeting. Ashley brought up an aspect of ZFS that we were not yet aware of, called thin provisioning. This type of provisioning allows for setting a top end of the ZFS allocation, but also then allows for one-way growth beyond that in case a large submission breaches the initially specified allocation limit.

@mreyescdl is going to allow the current test ingest to complete on stage (Nuxeo UCM San Joaquin collection content). Then using a new volume with a small 100GB ZFS allocation set and thin provisioning active, we'll start another test to observe this type of provisioning in action and collect test data. @ashleygould is also going to discuss temporarily setting ingest hosts to m5.large on stage with IAS for use during this test, as we've observed significant CPU I/O wait on the smaller existing hosts are are unsure if this is due to lack of network bandwidth or if it's disk access related.

ashleygould commented 2 years ago

Mark tested with a large submission and results were better than EFS, but ingest instances had high CPU IOWait. Stage instances are t3.small for ingest and m5.large for store.

next steps:

Links:

ashleygould commented 2 years ago

Update:

thin-provissioning is a bust. Does not do what we were hoping for. So we did not create a new ZFS filesystem. We will continue to test on the existing one.

Librato does not support FSx service, so not dashboards.

But - Martin Haye set up a cloudwatch dashboard for us: https://cloudwatch.amazonaws.com/dashboard.html?dashboard=UC3_ZFS_fs-0b9822a1af853be7a_Da[…]NDMxLTRiYWEtOWYwMC0xZTZlMjU3NjkwODciLCJNIjoiUHVibGljIn0= You have to click on the "options" and select the sample period. Otherwise it selects one for you, but does not tell you what it is.

Martin changed ec2 instance type on one of the uc3-mrt-ingest-stg hosts to c5n.large. This should give us much better network bandwidth, which I think may have been the bottle neck in the first trial.

mreyescdl commented 2 years ago

@ashleygould ZFS testing started on Stage - expect TBs of data to be submitted to Ingest workers over the next few days Workers now => (01 - c5n.large, 02 - t2.small)

mreyescdl commented 2 years ago

Ingest is having problems removing payload directory

[error] HandlerCleanup: Failure in removing: /dpr2/ingest_home/queue/bid-bd2f4790-bc24-412f-94d6-d7c37ce341e0/jid-1aa018ba-86d0-4fb3-9b91-64025ecd4993/producer   Continuing

This is due to ZFS hidden files being created:

$ ls -la producer/
total 14
-rw-r--r-- 1 dpr2 dpr2 1058816 Aug 23 14:53 .nfs00000000000000100000005b
mreyescdl commented 2 years ago

Problem was in the Manifest processor in core library. Valid manifest payload did not trigger error, but regular file payload would. Here is fix.
https://github.com/CDLUC3/mrt-core2/pull/16

elopatin-uc3 commented 2 years ago

Thanks for finding the root cause for this @mreyescdl

mreyescdl commented 2 years ago

Legacy EFS Stage disk contents (fs-6fe432c4.efs)

$ du -sh * 3.2G dataone 211M frontera 219G ingest_home 31G palestinian_museum 716K terry

I'll cleanup the ingest_home disk and move the rest to new ZFS disk.
Please remove any data if not needed @terrywbrady @elopatin-uc3

terrywbrady commented 2 years ago

@mreyescdl , I deleted the terry directory. I think @elopatin-uc3 will need to weigh in on the others.

mreyescdl commented 2 years ago

IAS request made to decommission Stage EFS and to reduce IO Throughput on prod

- Decommission EFS disk fs-6fe432c4.efs.us-west-2.amazonaws.com which is mounted
on Stage Ingest and Storage workers

- Reduce IO throughput on EFS production disk
fs-3b22fd91.efs.us-west-2.amazonaws.com from 50MB/s to 10MB/s