filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Caribbean LDN-01 Seal Storage Technology #1281

Closed salstorage closed 1 year ago

salstorage commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

NOTE: this is LDN app 1 of 2. Total data cap requested for this project is 8.4 PiB.
LDN app 2 of 2 can be found here: https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1282
Seal is collaborating with The Center for Extreme Data Management, Analysis and Visualization (CEDMAV) of the University of Utah on a pilot project to use Filecoin for data storage for a public dataset: OpenVisus datasets (they are very “sparse”, combustion, simulations, earth, satellite etc). The data sets are about 1.4 PiB. Seal and CEDMAV are also exploring specific use-cases and how the Filecoin Network can support ongoing research.

Seal is a carbon-neutral, decentralized cloud storage provider. Seal's technical leadership brings decades of experience from traditional enterprise storage companies including Seagate and Oracle, as well as world-class experience on the Filecoin Network. Today, Seal operates data centers across the US and Canada with enterprise-grade infrastructure and data policies.

What is the primary source of funding for this project?

Seal is funding the project

What other projects/ecosystem stakeholders is this project associated with?

The main stakeholder for this project is the University of Utah CEDMAV group. Our customer views decentralized data storage as an exciting platform that could yield many benefits for future large datasets.

Use-case details

Describe the data being stored onto Filecoin

Project Caribbean consists of a 1.4 PiB verified PUBLIC data set belonging to The Center for Extreme Data Management Analysis and Visualization at the University of Utah (CEDMAV, http://cedmav.org/). The data is OpenVisus data sets (they are very “sparse”, combustion, simulations, earth, satellite etc).

Where was the data in this dataset sourced from?

The Center for Extreme Data Management, Analysis and Visualization (CEDMAV) of the University of Utah

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://drive.google.com/drive/folders/1j6p8peLJJ9tbNhF5mD3sPBUKjFrw5UUy?usp=sharing

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Confirmed - this is a public dataset

What is the expected retrieval frequency for this data?

The data will be accessed by external collaborators and Researchers.
Retrieval would be daily as and when needed, Seal Storage will keep a unsealed copy for retrieval purposes

For how long do you plan to keep this dataset stored on Filecoin?

Three year term

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

We plan to store six copies of the 1.4 PiB data set [total of 8.4 PiB] in four different cities, in three different countries and across two continents.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Seal Storage has dual 100 Gbps internet connections. SPs will download data from Seal. 
Seal Storage will prepare and CAR the files into 32GB chunks for distribution

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We are currently meeting and discussing capabilities with several other SPs. Due to the size of this pilot, our SPs have requested DataCap before committing to the project.
We plan to choose enterprise-grade SPs for this project and will complete our due diligence post DataCap approval.

How will you be distributing deals across storage providers?

1 copy = 1.4 PiB

DLTX 1 copy in Omaha, Nebraska, USA
Ghostbytes 1 copy in Philadelphia, USA
DSS 1 copy in Sydney, Australia
Telnyx 1 copy multiple locations USA
Seal Storage 1 copy in Las Vegas, NV USA 
Seal Storage 1 copy in Montreal, Quebec, Canada 

Seal Storage must also keep a full hot copy unsealed for our Customer. 

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Once we receive DataCap, we will begin making deals as soon as customer data is transferred to Seal Storage November 2022
We currently have the support we need.
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

salstorage commented 1 year ago

The following Notaries have reviewed the 1.4PiB Data Storage Plan and support/approve project Caribbean Data Storage Plan Link: https://docs.google.com/document/d/1J8UPM8w4wxyPNISXpI0vYpMcPv6EinpiAGm4X14E6-0/edit?usp=sharing

Notaries: Joss Hua - Venus Team - IPFSForce - f1tfg54zzscugttejv336vivknmsnzzmyudp3t7wi Alex Kim - Define Platform - f1hhippi64yiyhpjdtbidfyzma6irc2nuav7mrwmi BlockMaker - f1o3twrcpwjtpcd4q36lpq4qmy2qfbgtyy5h6tsty Claudia Richoux - Banyan - f1oc6qvenzp7wsriu7edyebb325gnaovktmujl7jq Zhehao Chen - Fenbushi Capital - f1yqydpmqb5en262jpottko2kd65msajax7fi4rmq Wijnand Schouten - Speedium - f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa Gary Gao - FBG Capital - f1zffqhxwq2rrg7rtot6lmkl6hb2xyrrseawprzsq Mark Roddy - Holon - f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i Steven Li - IPFSForce - f1w2vyp4w6df44gbh4vxqle4w65zfrfnwhrl3hojy

Fenbushi-Filecoin commented 1 year ago

Confirm to support the application.

Joss-Hua commented 1 year ago

Willing to be supportive. I have had a preliminary face-to-face visit with the applicant, the applicant has shown some relevant materials and is happy to see that these data will bring more useful data to the network.

mjroddy commented 1 year ago

Repeating my support as per https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1282

Overall I am supportive of this project.

Given the size it is important that we get as much 'knowledge' out of this program as possible. A case study and marketing is included. in the Data Storage Plan Link

I would like to see that findings from onboarding this client be shared with other storage providers. This would include challenges, technical and sales tooling, even a case study from the SP's perspective so that the network as a whole benefits from this project.

For Filecoin to improve all SP's must be lifted up.

steven004 commented 1 year ago

Will to be supportive on this project.

BlockMakeronline commented 1 year ago

They contacted us earlier and told us much about the project. Happy to support the application.

cryptowhizzard commented 1 year ago

Happy to support

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

raghavrmadya commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

400TiB

Client address

f1bvlifue4nucdljqbcayi7i3ep535y3rackexiyy

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1bvlifue4nucdljqbcayi7i3ep535y3rackexiyy

DataCap allocation requested

200TiB

Id

65b30fae-0eb2-4a3c-a8f9-a14afb3f832d

raghavrmadya commented 1 year ago

All datacap will go to f1bvlifue4nucdljqbcayi7i3ep535y3rackexiyy

Fenbushi-Filecoin commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacec5wiuxl3fdku32ytttb2sxgtaukdjc6czswmxti5znbfkxjywntk

Address

f1bvlifue4nucdljqbcayi7i3ep535y3rackexiyy

Datacap Allocated

200.00TiB

Signer Address

f1yqydpmqb5en262jpottko2kd65msajax7fi4rmq

Id

65b30fae-0eb2-4a3c-a8f9-a14afb3f832d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec5wiuxl3fdku32ytttb2sxgtaukdjc6czswmxti5znbfkxjywntk

mjroddy commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebbvmadd2v4ttmgo4qnlkxqx52h5docnacwknwqaxgtfwz3jleor4

Address

f1bvlifue4nucdljqbcayi7i3ep535y3rackexiyy

Datacap Allocated

200.00TiB

Signer Address

f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i

Id

65b30fae-0eb2-4a3c-a8f9-a14afb3f832d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebbvmadd2v4ttmgo4qnlkxqx52h5docnacwknwqaxgtfwz3jleor4

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f01886710 has sealed 100.00% of total datacap.

⚠️ f01886710 has unknown IP location.

⚠️ All storage providers are located in the same region.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01886710 Unknown 8.00 TiB 100.00% 8.00 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
8.00 TiB 8.00 TiB 1 100.00%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

BDEio commented 1 year ago

@salstorage Hi! Great to see that you have gotten approval for DataCap! BDE is a verified deals auction house helping you to get paid storing your valuable data with reliable storage providers. If you need any help, please get in touch.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f01858410

Client address

f1bvlifue4nucdljqbcayi7i3ep535y3rackexiyy

DataCap allocation requested

400TiB

Id

eab8af5b-2939-4c99-982c-d8084e4dd83a

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1bvlifue4nucdljqbcayi7i3ep535y3rackexiyy

Last two approvers

megtei & Fenbushi-Filecoin

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

400TiB

Total DataCap granted for client so far

374.06TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.63PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
10104 9 200TiB 38.41 44.72TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

⚠️ f01886710 has unknown IP location.

⚠️ f02008222 has unknown IP location.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01886710 Unknown
Unknown
118.03 TiB 39.02% 116.81 TiB 1.03%
f01919423 Sydney, New South Wales, AU
Andrew Sjoquist Enterprises Pty Ltd
32.55 TiB 10.76% 32.52 TiB 0.10%
f01938357 Sydney, New South Wales, AU
Andrew Sjoquist Enterprises Pty Ltd
15.00 TiB 4.96% 15.00 TiB 0.00%
f01910202 Philadelphia, Pennsylvania, US
Cogent Communications
10.41 TiB 3.44% 10.41 TiB 0.00%
f01889910 Phoenix, Arizona, US
Cyxtera Technologies Inc
41.49 TiB 13.72% 41.49 TiB 0.00%
f01736668 Lincoln, Nebraska, US
LightEdge Solutions
23.60 TiB 7.80% 23.60 TiB 0.00%
f0855584 Lincoln, Nebraska, US
LightEdge Solutions
23.58 TiB 7.79% 23.58 TiB 0.00%
f01091851 Lincoln, Nebraska, US
LightEdge Solutions
22.51 TiB 7.44% 22.51 TiB 0.00%
f02008222new Unknown
Unknown
15.34 TiB 5.07% 15.34 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
17.47 TiB 17.47 TiB 1 5.77%
48.42 TiB 96.83 TiB 2 32.01%
28.59 TiB 85.77 TiB 3 28.35%
12.68 TiB 51.25 TiB 4 16.94%
10.09 TiB 51.19 TiB 5 16.92%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

salstorage commented 1 year ago

Paging Notary to sign next DC tranche please @Joss-Hua @steven004 @cryptowhizzard @BlockMakeronline @laudiacay @Alex11801 @GaryGJG

salstorage commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01919423 Sydney, New South Wales, AU
Andrew Sjoquist Enterprises Pty Ltd
37.63 TiB 10.26% 37.60 TiB 0.08%
f01938357 Sydney, New South Wales, AU
Andrew Sjoquist Enterprises Pty Ltd
15.31 TiB 4.17% 15.31 TiB 0.00%
f01886710 Arcadia, California, US
Cogent Communications
141.01 TiB 38.45% 139.79 TiB 0.86%
f01910202 Philadelphia, Pennsylvania, US
Cogent Communications
10.41 TiB 2.84% 10.41 TiB 0.00%
f01886690 Arcadia, California, US
Cogent Communications
4.18 TiB 1.14% 4.18 TiB 0.00%
f01889910 Phoenix, Arizona, US
Cyxtera Technologies Inc
62.41 TiB 17.01% 62.41 TiB 0.00%
f01736668 Lincoln, Nebraska, US
LightEdge Solutions
34.39 TiB 9.38% 34.39 TiB 0.00%
f0855584 Lincoln, Nebraska, US
LightEdge Solutions
23.58 TiB 6.43% 23.58 TiB 0.00%
f01091851 Lincoln, Nebraska, US
LightEdge Solutions
22.51 TiB 6.14% 22.51 TiB 0.00%
f02008222new Lincoln, Nebraska, US
LightEdge Solutions
15.34 TiB 4.18% 15.34 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
54.71 TiB 54.71 TiB 1 14.92%
37.15 TiB 74.30 TiB 2 20.26%
26.77 TiB 80.31 TiB 3 21.90%
26.43 TiB 106.25 TiB 4 28.97%
10.09 TiB 51.19 TiB 5 13.96%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

salstorage commented 1 year ago

Paging Notary to sign next DC tranche please @Joss-Hua @steven004 @cryptowhizzard @BlockMakeronline @laudiacay @Alex11801 @GaryGJG

cryptowhizzard commented 1 year ago

HI @salstorage

I can't sign since this one is not visible on my dashboard. I can sign on #1282 if you want?

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaced2npr2osb2rj4hwkuergzjhfjelkwbvksvttaimsgxkkf2633q3y

Address

f1bvlifue4nucdljqbcayi7i3ep535y3rackexiyy

Datacap Allocated

400.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

eab8af5b-2939-4c99-982c-d8084e4dd83a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced2npr2osb2rj4hwkuergzjhfjelkwbvksvttaimsgxkkf2633q3y

cryptowhizzard commented 1 year ago

Proposing on basis of trust. Seal is a long term trusted participant in the ecosystem. Did some retrieval requests and all looks ok.

1475Notary commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebnhis3keaz6ma624il6t7xeqyfnbkhgur35o3fatr27j2qigpvza

Address

f1bvlifue4nucdljqbcayi7i3ep535y3rackexiyy

Datacap Allocated

400.00TiB

Signer Address

f1ofq4mngy7ggcp755pfquq2gphjjnlydolf6awtq

Id

eab8af5b-2939-4c99-982c-d8084e4dd83a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebnhis3keaz6ma624il6t7xeqyfnbkhgur35o3fatr27j2qigpvza

salstorage commented 1 year ago

Please add the following Miner ID's to project: f01838599 and f01845552?

salstorage commented 1 year ago

Please add the following SP ID to application.

SP's are for Distributed Storage Solutions

f01274011 f01746964

data-programs commented 1 year ago
KYC

This user’s identity has been verified through filplus.storage

salstorage commented 1 year ago

Please note SP GHOSTBYTES/ Web3Cloud is no longer online We are removing GHOSTBTYES/ Web3Cloud from application Web3Cloud: f01506844 Ghostbytes: f01910202

salstorage commented 1 year ago

Due to removal of Ghostbytes/ Web3Cloud from application. We are adding SP Vogo-Digital Labs located in S. Korea to store 200 TB's of Caribbean Dataset MINER ID's: f02202753 and f01987994

salstorage commented 1 year ago

Adding Miner ID for DSS, Sydney Australia location:

f02238775

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

salstorage commented 11 months ago

Due to termination of sectors, Remove the following SP and Miner ID from application DLTX 1 copy in Omaha, Nebraska, USA Ghostbytes 1 copy in Philadelphia, USA Telnyx 1 copy multiple locations USA

Adding the following SP to application GreaterHeat Dallas US Location f02361686 f02345061

Sunnyiscoming commented 10 months ago

Hello, @salstorage per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

salstorage commented 10 months ago

Hello, @salstorage per the filecoin-project/notary-governance#922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

Completed @Sunnyiscoming

salstorage commented 10 months ago

Latest update on SP and distribution:

f01886690 f01886710 SEAL STORAGE TECHNOLOGY INC, Las Vegas USA f01274011 f01746964 f01919423 f01938357 f02238775 DSS Australia f01923553 f01923554 f01923555 f01923556 f02181705 SEAL STORAGE TECHNOLOGY INC, Montreal Canada f01987994 f02202753 VoGo Digital Labs, S Korea f02229460 f02832654 Aligned USA, Mid West / Ohio, USA f02361686 f02345061 GreaterHeat Dallas Tx USA

aggregation-and-compliance-bot[bot] commented 9 months ago
Client f01981070 does not follow the datacap usage rules. More info here. This application has been failing the requirements for 7 days. Please take appropiate action to fix the following DataCap usage problems. Criteria Treshold Reason
Percent of used DataCap stored with top provider < 75 The percent of Data from the client that is stored with their top provider is 100%. This should be less than 75%