filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Kernelogic - Coupled Model Intercomparison Project 6 (2/4) #1352

Closed kernelogic closed 1 year ago

kernelogic commented 1 year ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

Even though Slingshot v2 has ended, there are still strong demand from SPs to onboard useful data. This application is to onboard open dataset from AWS.

I will provide a nice web UI (https://github.com/tech-greedy/singularity-browser) to index all files onboarded and provide ways to retrieve.

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/60
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/59
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/46
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/297
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/298
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/304

What is the primary source of funding for this project?

Self-funded, BigD exchange.

What other projects/ecosystem stakeholders is this project associated with?

enterprise-sp-wg, BigD exchange.

Use-case details

Describe the data being stored onto Filecoin

The sixth phase of global coupled ocean-atmosphere general circulation model ensemble.

This dataset contains two S3 bucket:
arn:aws:s3:::esgf-world    671.7 TiB
arn:aws:s3:::cmip6-pds    1.0 PiB

Considering 12 replicas and padding, I am applying for 20PiB in total.

Where was the data in this dataset sourced from?

https://pangeo-data.github.io/pangeo-cmip6-cloud/
https://registry.opendata.aws/cmip6/

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://registry.opendata.aws/cmip6/

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

AWS open dataset. According to the license page, they are creative commons 4.0 licensed.

http://bit.ly/CMIP6_Citation_Search

What is the expected retrieval frequency for this data?

Multiple times per year.

For how long do you plan to keep this dataset stored on Filecoin?

18 months as a start. In the future when deal extension is possible, will keep them alive as long as possible.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

All regions.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

I will upload my prepared CAR files to a web server and coordinate with providers to download and propose offline deals.

Maximum 3 copies per SP entity and maximum of 12 copies for every pieceCID.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Beside the previous high quality SPs I have worked with, I also utilize bigD exchange to further decentralize the storage

To name a few from the community that I deal with regularly: PIKNIK, CabrinaHuang, HarryM, XinAn Xu.

From BigD exchange: Mog Li, Devin Chen, DSS Nathanial Marsh, Rabinovitch, Chris, arockpool Tony, Collen, FeelLin, MaiTian

How will you be distributing deals across storage providers?

Evenly across all providers I propose to, if they can handle. If an SP is a notary itself, this notary will receive no more than 20% of the total granted datacap.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

I have all I need to start making deals.
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

800TiB

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

DataCap allocation requested

256TiB

Id

14ff7011-d28e-4788-9a40-8d3c85cc9ebd

YuanHeHK commented 1 year ago

The previous applications of kernelogic have performed very well, and the data is also AWS open source data, and I will support this signing.

YuanHeHK commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebj6nchoehi3r6wxye7arksdwwm7lkumalvcc2ok5h3fipb4scwcc

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

256.00TiB

Signer Address

f1fg6jkxsr3twfnyhdlatmq36xca6sshptscds7xa

Id

14ff7011-d28e-4788-9a40-8d3c85cc9ebd

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebj6nchoehi3r6wxye7arksdwwm7lkumalvcc2ok5h3fipb4scwcc

psh0691 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacec5bduxz6t4oymy7lg2zfcc3l472ejrt4vab3xd55eon2lxqhxf2s

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

256.00TiB

Signer Address

f1qdko4jg25vo35qmyvcrw4ak4fmuu3f5rif2kc7i

Id

14ff7011-d28e-4788-9a40-8d3c85cc9ebd

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec5bduxz6t4oymy7lg2zfcc3l472ejrt4vab3xd55eon2lxqhxf2s

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

DataCap allocation requested

512TiB

Id

46f7fafe-4313-4d3d-80fd-c3faf1e64d9a

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Last two approvers

psh0691 & fireflyHZ

Rule to calculate the allocation request amount

10% of total dc amount requested

DataCap allocation requested

512TiB

Total DataCap granted for client so far

1PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

4PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
30246 7 256TiB 15.34 29.5TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01946689 Singapore, Singapore, SG
Amazon.com, Inc.
145.00 TiB 14.75% 145.00 TiB 0.00%
f01947708 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.91 TiB 14.74% 144.91 TiB 0.00%
f01964074 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.81 TiB 14.73% 144.81 TiB 0.00%
f01963842 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.81 TiB 14.73% 144.81 TiB 0.00%
f01946713 Singapore, Singapore, SG
Amazon.com, Inc.
144.00 TiB 14.65% 144.00 TiB 0.00%
f01945216 Singapore, Singapore, SG
Amazon.com, Inc.
120.47 TiB 12.26% 120.47 TiB 0.00%
f01944744 Hong Kong, Central and Western, HK
Singapore Telecommunications Ltd
138.94 TiB 14.13% 138.94 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 3rd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
32.00 GiB 128.00 GiB 4 0.01%
864.00 GiB 4.22 TiB 5 0.43%
30.28 TiB 181.69 TiB 6 18.48%
113.84 TiB 796.91 TiB 7 81.07%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

YuanHeHK commented 1 year ago

checker Looks like everything's okay

YuanHeHK commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacea2lzsqspmix6ak5op63guo2adwblnwm4kgwfvz6u272jtxdm5zqe

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

512.00TiB

Signer Address

f1fg6jkxsr3twfnyhdlatmq36xca6sshptscds7xa

Id

46f7fafe-4313-4d3d-80fd-c3faf1e64d9a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea2lzsqspmix6ak5op63guo2adwblnwm4kgwfvz6u272jtxdm5zqe

stcloudlisa commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaced6wehvpw55ctvj5uxiwlyke7co4givxnzuf2tdkwfbvop6qmhjrk

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

512.00TiB

Signer Address

f1jvvltduw35u6inn5tr4nfualyd42bh3vjtylgci

Id

46f7fafe-4313-4d3d-80fd-c3faf1e64d9a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced6wehvpw55ctvj5uxiwlyke7co4givxnzuf2tdkwfbvop6qmhjrk

stcloudlisa commented 1 year ago

Looks OK, I would like to support them

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

DataCap allocation requested

1PiB

Id

9ba9de7a-135d-4298-bec6-3954dcbf9b39

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Last two approvers

1LISA2 & fireflyHZ

Rule to calculate the allocation request amount

20% of total dc amount requested

DataCap allocation requested

1PiB

Total DataCap granted for client so far

1PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

4PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
32480 7 512TiB 14.29 9TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01946713 Singapore, Singapore, SG
Amazon.com, Inc.
145.00 TiB 14.29% 145.00 TiB 0.00%
f01945216 Singapore, Singapore, SG
Amazon.com, Inc.
145.00 TiB 14.29% 145.00 TiB 0.00%
f01946689 Singapore, Singapore, SG
Amazon.com, Inc.
145.00 TiB 14.29% 145.00 TiB 0.00%
f01947708 Tokyo, Tokyo, JP
Amazon.com, Inc.
145.00 TiB 14.29% 145.00 TiB 0.00%
f01964074 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.97 TiB 14.29% 144.97 TiB 0.00%
f01963842 Tokyo, Tokyo, JP
Amazon.com, Inc.
144.84 TiB 14.27% 144.84 TiB 0.00%
f01944744 Hong Kong, Central and Western, HK
Singapore Telecommunications Ltd
145.00 TiB 14.29% 145.00 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
192.00 GiB 1.13 TiB 6 0.11%
144.81 TiB 1013.69 TiB 7 99.89%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

BDEio commented 1 year ago

@kernelogic Hi! Great to see that you have gotten approval for DataCap! BDE is a verified deals auction house helping you to get paid storing your data with reliable storage providers. If you need any help, please get in touch.

flyworker commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecjrzm5gxa3fait72rsz3kptcj4ybsh4tpxbq4wx5sshc5odtqahe

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

1.00PiB

Signer Address

f1hlubjsdkv4wmsdadihloxgwrz3j3ernf6i3cbpy

Id

9ba9de7a-135d-4298-bec6-3954dcbf9b39

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecjrzm5gxa3fait72rsz3kptcj4ybsh4tpxbq4wx5sshc5odtqahe

cryptowhizzard commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacechlz3xdw2pklykmlmdos6q22a24wc4y6437tgfcbk3qj5pjoaflc

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

1.00PiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

9ba9de7a-135d-4298-bec6-3954dcbf9b39

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacechlz3xdw2pklykmlmdos6q22a24wc4y6437tgfcbk3qj5pjoaflc

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

DataCap allocation requested

2PiB

Id

006fa50e-c156-4550-84a1-811112f66f61

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Last two approvers

cryptowhizzard & flyworker

Rule to calculate the allocation request amount

40% of total dc amount requested

DataCap allocation requested

2PiB

Total DataCap granted for client so far

7.5PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

-2814749767106560B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
235547 22 1PiB 6.77 209.31TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

xinaxu commented 1 year ago

Checker bot looks healthy. Willing to support.

xinaxu commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebjqwwhev2r24pfhmkke2whfrzyabk6pwqboxulqilzlxk7pz3veu

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

2.00PiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

006fa50e-c156-4550-84a1-811112f66f61

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebjqwwhev2r24pfhmkke2whfrzyabk6pwqboxulqilzlxk7pz3veu

cryptowhizzard commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceboktbte6msgnjrqob5exuzqtgh7i2of54ypg437h6fvuln7jhjve

Address

f1icuxjq7zxoh7hyvodr7acywk4rymdehwkgpcg7y

Datacap Allocated

2.00PiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

006fa50e-c156-4550-84a1-811112f66f61

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceboktbte6msgnjrqob5exuzqtgh7i2of54ypg437h6fvuln7jhjve

sxxfuture-official commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.