filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Kernelogic - Coupled Model Intercomparison Project 6 (3/4) #1353

Closed kernelogic closed 1 year ago

kernelogic commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

Even though Slingshot v2 has ended, there are still strong demand from SPs to onboard useful data. This application is to onboard open dataset from AWS.

I will provide a nice web UI (https://github.com/tech-greedy/singularity-browser) to index all files onboarded and provide ways to retrieve.

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/60
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/59
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/46
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/297
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/298
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/304

What is the primary source of funding for this project?

Self-funded, BigD exchange.

What other projects/ecosystem stakeholders is this project associated with?

enterprise-sp-wg, BigD exchange.

Use-case details

Describe the data being stored onto Filecoin

The sixth phase of global coupled ocean-atmosphere general circulation model ensemble.

This dataset contains two S3 bucket:
arn:aws:s3:::esgf-world    671.7 TiB
arn:aws:s3:::cmip6-pds    1.0 PiB

Considering 12 replicas and padding, I am applying for 20PiB in total.

Where was the data in this dataset sourced from?

https://pangeo-data.github.io/pangeo-cmip6-cloud/
https://registry.opendata.aws/cmip6/

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://registry.opendata.aws/cmip6/

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

AWS open dataset. According to the license page, they are creative commons 4.0 licensed.

http://bit.ly/CMIP6_Citation_Search

What is the expected retrieval frequency for this data?

Multiple times per year.

For how long do you plan to keep this dataset stored on Filecoin?

18 months as a start. In the future when deal extension is possible, will keep them alive as long as possible.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

All regions.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

I will upload my prepared CAR files to a web server and coordinate with providers to download and propose offline deals.

Maximum 3 copies per SP entity and maximum of 12 copies for every pieceCID.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Beside the previous high quality SPs I have worked with, I also utilize bigD exchange to further decentralize the storage

To name a few from the community that I deal with regularly: PIKNIK, CabrinaHuang, HarryM, XinAn Xu.

From BigD exchange: Mog Li, Devin Chen, DSS Nathanial Marsh, Rabinovitch, Chris, arockpool Tony, Collen, FeelLin, MaiTian

How will you be distributing deals across storage providers?

Evenly across all providers I propose to, if they can handle. If an SP is a notary itself, this notary will receive no more than 20% of the total granted datacap.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

I have all I need to start making deals.
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

800TiB

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

DataCap allocation requested

256TiB

Id

5bda54c2-bc27-4274-9b72-283d78184ede

newwebgroup commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedttz62z2dc2juftfwridqih2an7zadu56csfoixi6tnk36a6zgpk

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

256.00TiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

5bda54c2-bc27-4274-9b72-283d78184ede

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedttz62z2dc2juftfwridqih2an7zadu56csfoixi6tnk36a6zgpk

Tom-OriginStorage commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebxr6qfcj6yr7e6wcjzg56jdfgi5ulwi276rswdkdrlpnikrez2re

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

256.00TiB

Signer Address

f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga

Id

5bda54c2-bc27-4274-9b72-283d78184ede

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebxr6qfcj6yr7e6wcjzg56jdfgi5ulwi276rswdkdrlpnikrez2re

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

DataCap allocation requested

512TiB

Id

927f079b-b0d0-49da-8fae-bfee1e18acc0

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Last two approvers

llifezou & newwebgroup

Rule to calculate the allocation request amount

10% of total dc amount requested

DataCap allocation requested

512TiB

Total DataCap granted for client so far

1PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

4PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
30246 7 256TiB 15.34 29.5TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01946689 Singapore, Singapore, SG
Amazon.com, Inc.
145.00 TiB 14.75% 145.00 TiB 0.00%
f01947708 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.91 TiB 14.74% 144.91 TiB 0.00%
f01964074 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.81 TiB 14.73% 144.81 TiB 0.00%
f01963842 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.81 TiB 14.73% 144.81 TiB 0.00%
f01946713 Singapore, Singapore, SG
Amazon.com, Inc.
144.00 TiB 14.65% 144.00 TiB 0.00%
f01945216 Singapore, Singapore, SG
Amazon.com, Inc.
120.47 TiB 12.26% 120.47 TiB 0.00%
f01944744 Hong Kong, Central and Western, HK
Singapore Telecommunications Ltd
138.94 TiB 14.13% 138.94 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
32.00 GiB 128.00 GiB 4 0.01%
864.00 GiB 4.22 TiB 5 0.43%
30.28 TiB 181.69 TiB 6 18.48%
113.84 TiB 796.91 TiB 7 81.07%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

NiwanDao commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceak75ae3ymiptsejlz63tgfci3idjgcqmckeiiggsavtnbd5srxlk

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

512.00TiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Id

927f079b-b0d0-49da-8fae-bfee1e18acc0

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceak75ae3ymiptsejlz63tgfci3idjgcqmckeiiggsavtnbd5srxlk

NDLABS-Leo commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebm2koxxzdct4ccbkwv7qsscfv7gmeagh3awtldvgpbo66fudreuk

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

512.00TiB

Signer Address

f1yayfsv6whu3rheviucvventj3y6t542xfpb47ei

Id

927f079b-b0d0-49da-8fae-bfee1e18acc0

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebm2koxxzdct4ccbkwv7qsscfv7gmeagh3awtldvgpbo66fudreuk

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

DataCap allocation requested

1PiB

Id

8d3d1074-2ec8-4859-9424-1b7fb105e1db

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Last two approvers

not found & xingjitansuo

Rule to calculate the allocation request amount

20% of total dc amount requested

DataCap allocation requested

1PiB

Total DataCap granted for client so far

1PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

4PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
32480 7 512TiB 14.29 9TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01946713 Singapore, Singapore, SG
Amazon.com, Inc.
145.00 TiB 14.29% 145.00 TiB 0.00%
f01945216 Singapore, Singapore, SG
Amazon.com, Inc.
145.00 TiB 14.29% 145.00 TiB 0.00%
f01946689 Singapore, Singapore, SG
Amazon.com, Inc.
145.00 TiB 14.29% 145.00 TiB 0.00%
f01947708 Tokyo, Tokyo, JP
Amazon.com, Inc.
145.00 TiB 14.29% 145.00 TiB 0.00%
f01964074 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.97 TiB 14.29% 144.97 TiB 0.00%
f01963842 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.84 TiB 14.27% 144.84 TiB 0.00%
f01944744 Hong Kong, Central and Western, HK
Singapore Telecommunications Ltd
145.00 TiB 14.29% 145.00 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
192.00 GiB 1.13 TiB 6 0.11%
144.81 TiB 1013.69 TiB 7 99.89%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

[ Discarded ] ## Request Proposed Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebzu3lntp2sosr4b356hvtskwbdmaqnysvozz7owjjvc7vk7ldslu

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

1.00PiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

8d3d1074-2ec8-4859-9424-1b7fb105e1db

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebzu3lntp2sosr4b356hvtskwbdmaqnysvozz7owjjvc7vk7ldslu

BDEio commented 1 year ago

@kernelogic Hi! Great to see that you have gotten approval for DataCap! BDE is a verified deals auction house helping you to get paid storing your data with reliable storage providers. If you need any help, please get in touch.

NiwanDao commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01964074 Tokyo, Tokyo, JP
Amazon.com, Inc.
480.72 TiB 14.48% 480.72 TiB 0.00%
f01947708 Tokyo, Tokyo, JP
Amazon.com, Inc.
479.38 TiB 14.44% 479.38 TiB 0.00%
f01963842 Tokyo, Tokyo, JP
Amazon.com, Inc.
478.94 TiB 14.43% 478.94 TiB 0.00%
f01945216 Singapore, Singapore, SG
Amazon.com, Inc.
478.53 TiB 14.41% 478.53 TiB 0.00%
f01946713 Singapore, Singapore, SG
Amazon.com, Inc.
476.44 TiB 14.35% 476.44 TiB 0.00%
f01946689 Singapore, Singapore, SG
Amazon.com, Inc.
447.69 TiB 13.48% 447.69 TiB 0.00%
f01944744 Singapore, Singapore, SG
Singapore Telecommunications Ltd
478.41 TiB 14.41% 478.41 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
416.00 GiB 2.03 TiB 5 0.06%
47.63 TiB 285.75 TiB 6 8.61%
433.19 TiB 2.96 PiB 7 91.33%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

NiwanDao commented 1 year ago

I randomly retrieve a deal and it succeeded. image

liyunzhi-666 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01964074 Tokyo, Tokyo, JP
Amazon.com, Inc.
480.72 TiB 14.48% 480.72 TiB 0.00%
f01947708 Tokyo, Tokyo, JP
Amazon.com, Inc.
479.38 TiB 14.44% 479.38 TiB 0.00%
f01963842 Tokyo, Tokyo, JP
Amazon.com, Inc.
478.94 TiB 14.43% 478.94 TiB 0.00%
f01945216 Singapore, Singapore, SG
Amazon.com, Inc.
478.53 TiB 14.41% 478.53 TiB 0.00%
f01946713 Singapore, Singapore, SG
Amazon.com, Inc.
476.44 TiB 14.35% 476.44 TiB 0.00%
f01946689 Singapore, Singapore, SG
Amazon.com, Inc.
447.69 TiB 13.48% 447.69 TiB 0.00%
f01944744 Singapore, Singapore, SG
Singapore Telecommunications Ltd
478.41 TiB 14.41% 478.41 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
416.00 GiB 2.03 TiB 5 0.06%
47.63 TiB 285.75 TiB 6 8.61%
433.19 TiB 2.96 PiB 7 91.33%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

liyunzhi-666 commented 1 year ago

I did some retrieve deals and downloaded the entire sector file. Here are some results that look good. I would like to support this round, please keep it up. d4e535a3345d22602b9dd0ad36420b3 f60987bd5d58c6f625c10885c463f2f 71d0175db0ffe413eabc29a8003996c

liyunzhi-666 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecipnngb3vykoxvqc6k6fworyqvh63lgxbsgnojn5smjdd3utxsce

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

1.00PiB

Signer Address

f1pszcrsciyixyuxxukkvtazcokexbn54amf7gvoq

Id

8d3d1074-2ec8-4859-9424-1b7fb105e1db

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecipnngb3vykoxvqc6k6fworyqvh63lgxbsgnojn5smjdd3utxsce

cryptowhizzard commented 1 year ago

Retrieval works, data is ok :)

cryptowhizzard commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceadnrooz7hh6g4kewdwtpmeh3lci3xnhtzpddfahfx62l2vqk3w4o

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

1.00PiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

8d3d1074-2ec8-4859-9424-1b7fb105e1db

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceadnrooz7hh6g4kewdwtpmeh3lci3xnhtzpddfahfx62l2vqk3w4o

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

DataCap allocation requested

2PiB

Id

50081510-25eb-47f3-abaf-7bf01d13f2ad

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Last two approvers

cryptowhizzard & liyunzhi-666

Rule to calculate the allocation request amount

40% of total dc amount requested

DataCap allocation requested

2PiB

Total DataCap granted for client so far

7.5PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

-2814749767106560B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
235547 22 1PiB 6.77 209.31TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

xiaoyuaiheshui commented 1 year ago

Reasonable distribution and retrieval supported.

xiaoyuaiheshui commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceaznaugepr7oy3mrx4x2fqwb46d74rwn3mxlaemehuhghl6odt7a4

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

2.00PiB

Signer Address

f122qmy25wdtt5mxd77kndiq7z5x2n3iwiuz2wdsa

Id

50081510-25eb-47f3-abaf-7bf01d13f2ad

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaznaugepr7oy3mrx4x2fqwb46d74rwn3mxlaemehuhghl6odt7a4

flyworker commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebsfubpxqcuufn3bv3znuufazdtcijj27c2aebt752wrgz7sgwvz4

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

2.00PiB

Signer Address

f1hlubjsdkv4wmsdadihloxgwrz3j3ernf6i3cbpy

Id

50081510-25eb-47f3-abaf-7bf01d13f2ad

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebsfubpxqcuufn3bv3znuufazdtcijj27c2aebt752wrgz7sgwvz4