filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] Baikal Seal Storage Technology #325

Closed scharfstein closed 1 year ago

scharfstein commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

Our customer is a Dark Matter Group within UC Berkeley and Seal is involved in a project with them to store the outputs of their scientific experiments. They would like to upload data to a distributed platform for other globally based researchers to be able to access this data. We will kicked off the project in early March 2022 with ingestion estimated to begin in mid April 2022 via portable disk unit. The customer’s data will not be encrypted, access controls will be implemented. They are looking for storage for at least the next three years.

Seal is a carbon-neutral, decentralized cloud storage provider. Seal's technical leadership brings decades of experience from traditional enterprise storage companies including Seagate and Oracle, as well as world-class experience on the Filecoin Network. Today, Seal operates data centers across the US and Canada with enterprise-grade infrastructure and data policies.

What is the primary source of funding for this project?

Seal is funding the project.

What other projects/ecosystem stakeholders is this project associated with?

None at this time.

Use-case details

Describe the data being stored onto Filecoin

The data sets are the original outputs of scientific experiments.

Where was the data in this dataset sourced from?

The data sets have been created by dark matter-related experiments and instrumentation.

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

Yes. A link will be added shortly.

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

The current data set requires permission based access.

A goal of the pilot project is for Seal to work with our customer to provide a permission based model to access data. Staged data for access will be supported on IPFS, Seaweed FS Open Source tools.

What is the expected retrieval frequency for this data?

Archival is primary. The data will be accessed by external collaborators and Researchers.

For how long do you plan to keep this dataset stored on Filecoin?

Three years, at least.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

We plan to store five copies of the 400 TiB data set [total of 2 PiB] in five different cities, in three different countries and across two continents.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Seal Storage has dual 100 Gbps internet connections. SPs will download data from Seal. Offline data transfer may be possible.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We are currently discussing capabilities and performing due diligence with several SPs and have chosen three SPs for this project. We chose these based on their current storage capacity, compute capabilities, enterprise-grade DCs and bandwidth.

How will you be distributing deals across storage providers?

Holon, 400 TiB
ElioVP, 400 TiB
PikNik, 400 TiB
Seal, 800 TiB

Seal will also be keeping a hot copy (400 TB) for the Customer available for access.

The data ingestion will follow this approximate schedule:

55 TB right away
by the end of year 1: 5 TB
by end of year 2: 50 TB
by end of year 3: 290 TB

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes, we have the resources/funding to begin making deals once we receive DataCap. 

We currently have the support we need.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

galen-mcandrew commented 2 years ago

Multisig Notary requested

Total DataCap requested

2PiB

Expected weekly DataCap usage rate

100TiB

large-datacap-requests[bot] commented 2 years ago

**Multisig created and sent to RKH f01838560

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Multisig Notary address

f01838560

Client address

f1usscfxtogr5v4jmi32uzkckeql2mgvun72q37ga

DataCap allocation requested

50TiB

dannyob commented 2 years ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacediiie2ljjmt5ippxnt2iht4hjtr2si37yb7mxdl75kak4qauf23y

Address

f1usscfxtogr5v4jmi32uzkckeql2mgvun72q37ga

Datacap Allocated

50.00TiB

Signer Address

f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacediiie2ljjmt5ippxnt2iht4hjtr2si37yb7mxdl75kak4qauf23y

TimWilliams00 commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceahnlilxqzokl2km27u2vxc3kovsqyinjg2ijx7oxmcwo7bs3tceu

Address

f1usscfxtogr5v4jmi32uzkckeql2mgvun72q37ga

Datacap Allocated

50.00TiB

Signer Address

f1fkxkfxgopjf3ufnfg5i3m6qlwf73kp4w5zz7nnq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceahnlilxqzokl2km27u2vxc3kovsqyinjg2ijx7oxmcwo7bs3tceu

dkkapur commented 2 years ago

This went through, clearing the warning.

Screen Shot 2022-06-03 at 12 13 05 PM
large-datacap-requests[bot] commented 2 years ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

salstorage commented 2 years ago

@raghavrmadya waiting for auto bot allocation for next tranche

dkkapur commented 2 years ago

@salstorage -> as part of routine cleanup we were doing (cc @galen-mcandrew) this notary was actually deprecated and set to 0. see https://filplus.d.interplanetary.one/notaries?showInactive=true&filter=Baikal. this is likely because it got picked up in our filters for "inactive" notaries where we had latent DataCap. can you shed any light on recent progress for this application and we can get you started up again?

@galen-mcandrew @simonkim0515 @raghavrmadya what are your thoughts on getting this stood up via a new app following the latest guidelines (i.e., issue on notary governance + new app in this repo)? @kevzak is this a fit for E-Fil+ given private data?

kevzak commented 2 years ago

I think if Seal already had notaries that supported this application, there's no need to change the path to DataCap. If they need to start over, then it might be worth considering E-Fil

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f01886710 has sealed 43.94% of total datacap.

⚠️ f01886710 has unknown IP location.

⚠️ f01873432 has sealed 30.95% of total datacap.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01886710 Unknown 14.01 TiB 43.94% 14.01 TiB 0.00%
f01873432 Las Vegas, Nevada, US 9.87 TiB 30.95% 9.87 TiB 0.00%
f01157018 Sydney, New South Wales, AU 2.69 TiB 8.43% 2.69 TiB 0.00%
f01157027 Sydney, New South Wales, AU 1.81 TiB 5.68% 1.81 TiB 0.00%
f01156901 Sydney, New South Wales, AU 1.67 TiB 5.23% 1.67 TiB 0.00%
f01156975 Sydney, New South Wales, AU 1.65 TiB 5.18% 1.65 TiB 0.00%
f01345523 Antwerpen, Flanders, BE 192.00 GiB 0.59% 192.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 97.65% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
4.86 TiB 4.86 TiB 1 15.24%
2.39 TiB 4.78 TiB 2 15.00%
7.16 TiB 21.49 TiB 3 67.41%
192.00 GiB 768.00 GiB 4 2.35%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Verifier
f1rovtu5m3gq7q5vu4kfh4oiiif643gqq7voi4ida Seal Storage Technology 6.38 TiB 190 LDN EFil+

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

johansealstorage commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 2nd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01157018 Melbourne, Victoria, AU
Anycast Global Backbone
2.69 TiB 8.43% 2.69 TiB 0.00%
f01157027 Melbourne, Victoria, AU
Anycast Global Backbone
1.81 TiB 5.68% 1.81 TiB 0.00%
f01156901 Melbourne, Victoria, AU
Anycast Global Backbone
1.67 TiB 5.23% 1.67 TiB 0.00%
f01156975 Melbourne, Victoria, AU
Anycast Global Backbone
1.65 TiB 5.18% 1.65 TiB 0.00%
f01345523 Antwerpen, Flanders, BE
Cogent Communications
192.00 GiB 0.59% 192.00 GiB 0.00%
f01886710 Las Vegas, Nevada, US
GTT Communications Inc.
14.01 TiB 43.94% 14.01 TiB 0.00%
f01873432 Las Vegas, Nevada, US
PiKNiK & Company Inc.
9.87 TiB 30.95% 9.87 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 2nd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
4.86 TiB 4.86 TiB 1 15.24%
2.39 TiB 4.78 TiB 2 15.00%
7.16 TiB 21.49 TiB 3 67.41%
192.00 GiB 768.00 GiB 4 2.35%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f1rovtu5m3gq7q5vu4kfh4oiiif643gqq7voi4ida Seal Storage Technology 24.62 TiB 495 1cryptowhizzard
1Fenbushi-Filecoin
1flyworker
1UnionLabs2020

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

salstorage commented 1 year ago

@raghavrmadya @dkkapur @galen-mcandrew this application is in a deprecated state. LDN Application #1212 is active and replaces application #325

Please close this application as inactive/void Thanks Sal - Seal Storage