CDLUC3 / mrt-doc

Documentation and Information regarding the Merritt repository
8 stars 4 forks source link

SDSC data migration from Minio to native S3 #1701

Open dloy opened 9 months ago

dloy commented 9 months ago

Minio

Minio is an S3 API based service that preserves the content on a standard file system. SDSC is planning to move from using Minio to a "native S3 service". The native service directly writes content to block storage content below that is not file based.

Minio is able to mimic S3 by storing metadata content describing the files in a sidecar file that contains these properties.

Because Minio acts as an interface it has had a number of problems supporting the AWS S3 API - especially on some of the lesser used functions:

Getting off Minio is a good thing!

From Gavin:


From: David Loy <[David.Loy@ucop.edu](mailto:David.Loy@ucop.edu)>
Date: Wednesday, November 29, 2023 at 1:10 PM

...

Hi Gavin
...
Will our existing content be available on the new service or will there be some form of migration required?

David

---------
Hi David, 
...
Unfortunately, I think we may need to migrate the data. MinIO stores the object metadata on the file system in 
a sidecar file within a shadow directory structure. We can expose the existing buckets as the keys and object 
content would be the same, but they’d be missing the metadata. We may be able to re-ingest the data locally 
to speed up the process. It’s a challenge that’s worth thinking about early, but we have some time. 

Thanks,
Gavin

Amount of content

Currently we have ~572TB content saved at SDSC.

Problem

If SDSC relies on us to transfer this content from Minio to an S3 native service at SDSC we are looking at some excessive data movement costs with AWS on > 1/2 P data. Input content to AWS is not charged (according to one source). Content sent from within AWS to an outside location is charged.

Getting the specific cost for DataTransfer-Out-Bytes or REGIONNAME1-DataTransfer-Out-Bytes is difficult. One source has this as "Next 100 TB / Month | $0.084 per GB. So 572000 x $0.084" = $48,048. This would strictly be the output transfer costs and does not include:

Suggestion

Work with SDSC to find an onsite way to transfer content. "We may be able to re-ingest the data locally to speed up the process." This would be excellent and then leave us with the issues of dealing with:

If SDSC cannot handle the "ingest" cost mentioned above, other possible solutions might include a temporary cloud service on their system for our code doing the transfer. Better if handled by SDSC.

Important that we start discussions with them about this migration.

elopatin-uc3 commented 9 months ago

@elopatin-uc3 Will set up a meeting with SDSC to discuss.

elopatin-uc3 commented 8 months ago

Meeting with SDSC on 1/11.

terrywbrady commented 8 months ago

Jan 11 meeting with SDSC

elopatin-uc3 commented 3 months ago

Related: #1927