filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Kernelogic - World Bank - Light Every Night #840

Closed kernelogic closed 1 year ago

kernelogic commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

Even though Slingshot v2 has ended, there are still strong demand from SPs to onboard useful data. This application is to onboard open dataset from AWS.

I will provide a nice web UI to index all files onboarded and provide ways to retrieve.

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/60
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/59
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/46
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/297
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/298
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/304

What is the primary source of funding for this project?

Self-funded, BigD exchange.

What other projects/ecosystem stakeholders is this project associated with?

enterprise-sp-wg, BigD exchange.

Use-case details

Describe the data being stored onto Filecoin

Light Every Night - World Bank Nightime Light Data – provides open access to all nightly imagery and data from the Visible Infrared Imaging Radiometer Suite Day-Night Band (VIIRS DNB) from 2012-2020 and the Defense Meteorological Satellite Program Operational Linescan System (DMSP-OLS) from 1992-2013. 

Where was the data in this dataset sourced from?

AWS Open dataset

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

https://registry.opendata.aws/wb-light-every-night/

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

AWS Open dataset

What is the expected retrieval frequency for this data?

Multiple times per year.

For how long do you plan to keep this dataset stored on Filecoin?

18 months.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

All regions.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

I will upload my prepared CAR files to a web server and coordinate with providers to download and propose offline deals.

Maximum 3 copies per SP entity and maximum of 10 copies for every pieceCID.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Beside the previous SPs I have worked with, I also utilize bigD exchange to further decentralize the storage

To name a few from the community that I deal with regularly: PIKNIK, Holon, CabrinaHuang, HarryM, BigBear, j1v, XinAn Xu, WillTechMusing.

From BigD exchange: Mog Li, Devin Chen, DSS Nathanial Marsh, Rabinovitch, Vin K, arockpool Tony

How will you be distributing deals across storage providers?

Evenly across all providers I propose to, if they can handle. If a miner is a notary itself, this notary will receive no more than 20% of the total granted datacap.

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

I have all I need to start making deals.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

kernelogic commented 2 years ago

Reason to consider this application:

  1. This is AWS public open dataset
  2. I have good track record of no self dealing and transparent distribution
  3. I am one of the two developers of Singularity, capable of onboarding dataset in this scale
  4. I will provide a WEB UI for retrievals
raghavrmadya commented 2 years ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

750TiB

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

DataCap allocation requested

256TiB

dannyob commented 2 years ago

Hey @kernelogic this looks pretty exciting, and I'd like to approve this request. Could you point me to a source or code that shows the total size of this dataset? I've only been able to find a source that says it's "over 250 terabytes" (https://worldbank.github.io/OpenNightLights/tutorials/mod2_1_data_overview.html) , and you're asking for more data than that.

kernelogic commented 2 years ago

Hi @dannyob Happy to answer this, I am using the s3 command to summarize size, in particular for this bucket: aws s3 ls s3://globalnightlight --no-sign-request --summarize --human-readable --recursive

And the result comes back at 296.0 TiB with 4245327 files. Considering I plan to store 10 replicas and some overhead on the padding, therefore I am applying for 5 PiB.

fireflyHZ commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceck7juh3fhqd5dmvyrs3geerdzsk7nvqkv7qfchck7anjb5prdebg

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

256.00TiB

Signer Address

f1fg6jkxsr3twfnyhdlatmq36xca6sshptscds7xa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceck7juh3fhqd5dmvyrs3geerdzsk7nvqkv7qfchck7anjb5prdebg

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

DataCap allocation requested

512TiB

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Last two approvers

fireflyHZ & not found

Rule to calculate the allocation request amount

10% of total dc amount requested

DataCap allocation requested

512TiB

Total DataCap granted for client so far

32GiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

4.99PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
0 0 256TiB 0 63.09TiB
ipfscn commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecgvzmsx37bhyl6jhkulepwz7bpdbmfpjikiu7bqlzdhcz4lglg5c

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

512.00TiB

Signer Address

f1j4n74chme7whbz3yls4a7ixqewb6dijypqg2a3a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecgvzmsx37bhyl6jhkulepwz7bpdbmfpjikiu7bqlzdhcz4lglg5c

newwebgroup commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceazzrybbera3gskpohjmqicxe7njp2wubdpvfud4rccqgifmlfnvc

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

512.00TiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceazzrybbera3gskpohjmqicxe7njp2wubdpvfud4rccqgifmlfnvc

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

DataCap allocation requested

1PiB

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Last two approvers

newwebgroup & ipfscn

Rule to calculate the allocation request amount

20% of total dc amount requested

DataCap allocation requested

1PiB

Total DataCap granted for client so far

768TiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

4.25PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
6216 6 512TiB 28.12 120.32TiB
Tom-OriginStorage commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceadi5g6s7u3ffxvqo6csrgglszbzlw326lz65cv5vgnw2vltfd3ok

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

1.00PiB

Signer Address

f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceadi5g6s7u3ffxvqo6csrgglszbzlw326lz65cv5vgnw2vltfd3ok

flyworker commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceblmwy5kbfdjk7tsvfuzwuswvshjqtfqwgzf3zpb3d34oypojz2eq

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

1.00PiB

Signer Address

f1hlubjsdkv4wmsdadihloxgwrz3j3ernf6i3cbpy

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceblmwy5kbfdjk7tsvfuzwuswvshjqtfqwgzf3zpb3d34oypojz2eq

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

DataCap allocation requested

2PiB

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Last two approvers

flyworker & llifezou

Rule to calculate the allocation request amount

40% of total dc amount requested

DataCap allocation requested

2PiB

Total DataCap granted for client so far

1.75PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

3.25PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
37299 13 1PiB 14.00 233.29TiB
newwebgroup commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceawpend4ckchnnjgj72zn5akwbofdown5cwn55o4lbmbgnn5ngnko

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

2.00PiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceawpend4ckchnnjgj72zn5akwbofdown5cwn55o4lbmbgnn5ngnko

ipfscn commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecaxyktvq5ydrd2n3dcqzcaipjkrtkkqeb5c4wh26ojo74fz6qjbk

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

2.00PiB

Signer Address

f1j4n74chme7whbz3yls4a7ixqewb6dijypqg2a3a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecaxyktvq5ydrd2n3dcqzcaipjkrtkkqeb5c4wh26ojo74fz6qjbk

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 5

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

DataCap allocation requested

1.25PiB

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Last two approvers

ipfscn & newwebgroup

Rule to calculate the allocation request amount

80% of total dc amount requested

DataCap allocation requested

1.25PiB

Total DataCap granted for client so far

3.75PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

1.25PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
107697 20 2PiB 7.34 507.96TiB
newwebgroup commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceauu4pe5wipkuem7aofoca262nkeguboj6333n2w32arqmkxgll5y

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

1.25PiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceauu4pe5wipkuem7aofoca262nkeguboj6333n2w32arqmkxgll5y

Tom-OriginStorage commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceagitfsl5l2gns44hxziyqb2r22d3ruvqfu2vqabzisoicohb3jva

Address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Datacap Allocated

1.25PiB

Signer Address

f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceagitfsl5l2gns44hxziyqb2r22d3ruvqfu2vqabzisoicohb3jva

large-datacap-requests[bot] commented 1 year ago

The issue reached the total datacap requested. This should be closed

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mbgw2yiypfay4zeuw7pmvytyoie457ogrolwbva

Last two approvers

llifezou & newwebgroup

Rule to calculate the allocation request amount

total dc reached

DataCap allocation requested

0

Total DataCap granted for client so far

5PiB

Datacap to be granted to reach the total amount requested by the client (5 PiB)

0B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
158010 25 1.25PiB 6.05 310.71TiB
filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f0143858 Clifton, New Jersey, US 293.66 TiB 5.77% 293.66 TiB 0.00%
f03223 San Jose, California, US 292.31 TiB 5.74% 292.31 TiB 0.00%
f02301 San Jose, California, US 291.72 TiB 5.73% 291.72 TiB 0.00%
f0240185 Clifton, New Jersey, US 291.28 TiB 5.72% 291.28 TiB 0.00%
f01943663 Hong Kong, Central and Western, HK 287.47 TiB 5.65% 287.44 TiB 0.01%
f01964132 Bangkok, Bangkok, TH 277.47 TiB 5.45% 277.47 TiB 0.00%
f01907460 Seattle, Washington, US 271.79 TiB 5.34% 267.19 TiB 1.69%
f01928097 Hong Kong, Central and Western, HK 262.91 TiB 5.16% 262.91 TiB 0.00%
f01929565 Sydney, New South Wales, AU 238.16 TiB 4.68% 236.25 TiB 0.80%
f01859603 Shenzhen, Guangdong, CN 233.22 TiB 4.58% 201.66 TiB 13.53%
f01923787 Shenzhen, Guangdong, CN 218.55 TiB 4.29% 196.94 TiB 9.89%
f01923786 Hong Kong, Central and Western, HK 215.00 TiB 4.22% 194.66 TiB 9.46%
f01918046 Kuala Lumpur, Kuala Lumpur, MY 208.28 TiB 4.09% 168.44 TiB 19.13%
f01909705 Kuala Lumpur, Kuala Lumpur, MY 206.03 TiB 4.05% 168.50 TiB 18.22%
f01918045 Kuala Lumpur, Kuala Lumpur, MY 205.94 TiB 4.05% 168.44 TiB 18.21%
f01938671new Hong Kong, Central and Western, HK 187.70 TiB 3.69% 184.81 TiB 1.54%
f01938674new Shenzhen, Guangdong, CN 184.45 TiB 3.62% 177.16 TiB 3.96%
f01927554 Shenzhen, Guangdong, CN 183.23 TiB 3.60% 183.23 TiB 0.00%
f01928520 Maywood Park, Oregon, US 152.41 TiB 2.99% 148.72 TiB 2.42%
f01938601 Maywood Park, Oregon, US 151.91 TiB 2.98% 149.94 TiB 1.30%
f01222595 Moscow, Moscow, RU 95.66 TiB 1.88% 93.75 TiB 1.99%
f01926686 Hangzhou, Zhejiang, CN 92.72 TiB 1.82% 92.72 TiB 0.00%
f01970716new Shenzhen, Guangdong, CN 70.53 TiB 1.39% 70.53 TiB 0.00%
f01985775 Dallas, Texas, US 56.25 TiB 1.10% 56.25 TiB 0.00%
f01985745 Dallas, Texas, US 56.22 TiB 1.10% 56.22 TiB 0.00%
f033462 Dallas, Texas, US 55.56 TiB 1.09% 55.56 TiB 0.00%
f01660795 Shenzhen, Guangdong, CN 7.84 TiB 0.15% 7.84 TiB 0.00%
f047419 North Prairie, Wisconsin, US 2.59 TiB 0.05% 2.59 TiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
1.97 TiB 1.97 TiB 1 0.04%
4.25 TiB 8.59 TiB 2 0.17%
39.31 TiB 119.75 TiB 3 2.35%
6.38 TiB 25.66 TiB 4 0.50%
7.42 TiB 37.27 TiB 5 0.73%
9.39 TiB 58.05 TiB 6 1.14%
32.66 TiB 252.06 TiB 7 4.95%
98.03 TiB 791.06 TiB 8 15.54%
132.06 TiB 1.20 PiB 9 24.07%
60.41 TiB 694.13 TiB 10 13.63%
101.19 TiB 1.09 PiB 11 21.98%
7.06 TiB 91.59 TiB 12 1.80%
14.66 TiB 202.75 TiB 13 3.98%
29.59 TiB 437.75 TiB 14 8.60%
1.31 TiB 22.72 TiB 15 0.45%
192.00 GiB 3.19 TiB 16 0.06%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find Organization Name field in the information provided We could not find Website \/ Social Media field in the information provided We could not find Total amount of DataCap being requested (between 500 TiB and 5 PiB) field in the information provided We could not find Weekly allocation of DataCap requested (usually between 1-100TiB) field in the information provided We could not find On-chain address for first allocation field in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.