filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] FogMeta Lab - store ETH snapshots #1138

Closed hengdingy closed 1 year ago

hengdingy commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

FogMeta Lab's research spans multiple levels from system technology, infrastructure, and middleware to services and solutions, and involves future systems, network technology and business, distributed systems and management, information management, and interactive and innovative services. Based on the views on and practices in the industry, FogMeta also solves the problem of business complexity through operations optimization and other technologies.
'filecoin-ipfs-data-rebuilder' is a project of FogMeta, a data build-and-rebuild tool between the IPFS network and the Filecoin network. Rebuilder ensures a permanent storage of at least a cold & hot backup and makes data retrievable at any time.

What is the primary source of funding for this project?

FogMeta Lab.

What other projects/ecosystem stakeholders is this project associated with?

No.

Use-case details

Describe the data being stored onto Filecoin

Full node snapshots of the Ethereum Mainnet updated each month.

Where was the data in this dataset sourced from?

The snapshots are maintained by allada (https://github.com/allada) and are updated each month. Please refer to the link here: https://github.com/allada/eth-archive-snapshot
Moreover, we're also running Ethereum Mainnet nodes and will export a snapshot every month. 

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

s3://public-blockchain-snapshots/eth/

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, it's a public dataset.

What is the expected retrieval frequency for this data?

3 to 5 times a month.

For how long do you plan to keep this dataset stored on Filecoin?

At least 500 days.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

Preferably in all continents.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

The data will be sent to storage providers, and be uploaded to the web server or IPFS for storage providers to download.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We will use the FilSwan platform to distribute these data. The Market Matcher, a module of the platform, will choose the most suitable storage providers for us automatically and make sure that the data can be retrieved in the future.

How will you be distributing deals across storage providers?

We are using FilSwan client agent for batch sending deals.  https://github.com/filswan/swan-client
FilSwan has a reputation module called Swan reputation system (https://docs.filswan.com/filswan-platform/overview/reputation-system) to give storage providers scores for the data storage behavior, it based on Time-based Reachability + Regional Weighted Adjusted Power + General Deals and Verified-Storage Provider Deals
FilSwan Auction System will match the storage providers based on reputation and coditions. For the bidding policy you can find it here: https://docs.filswan.com/filswan-platform/overview/filswan-auction-system.
Some of the storage providers are as follows:
f0143858
f03624
f010088
f02301
f0187709
f01402814
f01859603
f01133080
f01858429
f01398391
f01072221
f0240185
f01390330
f01784458
f01840390
f01870135
f0520660
f01871352
f01883179
f01886797

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes. FogMeta Lab will fund the project.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

3PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

DataCap allocation requested

50TiB

Id

1da98eb6-b724-4d30-87da-9b78e0b0002b

IreneYoung commented 1 year ago

@hengdingy It seems that the retrieval frequency is relatively high. Will there be any problem for the SPs you work with?

kernelogic commented 1 year ago

Looks like this is public data and FogMeta have done LDNs before. Willing to support.

kernelogic commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecuek4xjyreyvojb6d7nahmqlzh2qniikafsz5fx45wxmjqnmybhg

Address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Datacap Allocated

50.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

1da98eb6-b724-4d30-87da-9b78e0b0002b

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecuek4xjyreyvojb6d7nahmqlzh2qniikafsz5fx45wxmjqnmybhg

large-datacap-requests[bot] commented 1 year ago

Aborting. Exit Code is Non 0

hengdingy commented 1 year ago

@hengdingy It seems that the retrieval frequency is relatively high. Will there be any problem for the SPs you work with?

@IreneYoung We want to support more people in running ETH nodes. And we'll definitely select suitable SPs with enough retrieval capability.

IreneYoung commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacec6jwhyc7r5m2ozwe26alw2t7atacopt6jr6t4e7d4qtg4vbrdmke

Address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Datacap Allocated

50.00TiB

Signer Address

f1d4gmpqz3execjj2wvrxuuhvbms5mzh7t7yqrviq

Id

1da98eb6-b724-4d30-87da-9b78e0b0002b

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec6jwhyc7r5m2ozwe26alw2t7atacopt6jr6t4e7d4qtg4vbrdmke

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

✔️ Storage provider distribution looks healthy.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f0717969 Los Angeles, California, US 3.51 TiB 9.37% 3.07 TiB 12.46%
f01222595 Moscow, Moscow, RU 3.45 TiB 9.21% 2.89 TiB 16.29%
f03624 Nürnberg, Bavaria, DE 3.23 TiB 8.63% 2.80 TiB 13.53%
f0187709 Moscow, Moscow, RU 3.01 TiB 8.02% 2.50 TiB 16.75%
f01886797 Vancouver, British Columbia, CA 2.98 TiB 7.94% 2.54 TiB 14.70%
f01163272 Perm, Perm Krai, RU 2.91 TiB 7.75% 2.38 TiB 18.28%
f01072221 Los Angeles, California, US 2.72 TiB 7.25% 2.25 TiB 17.24%
f01896422 Fremont, California, US 2.69 TiB 7.17% 2.25 TiB 16.28%
f01871352 Seoul, Seoul, KR 2.54 TiB 6.77% 2.23 TiB 12.31%
f010088 Everett, Washington, US 2.19 TiB 5.84% 2.00 TiB 8.57%
f0240456 Chengdu, Sichuan, CN 1.85 TiB 4.93% 1.82 TiB 1.69%
f01402814 Singapore, Singapore, SG 1.79 TiB 4.77% 1.51 TiB 15.72%
f08399 Seattle, Washington, US 1.56 TiB 4.17% 1.34 TiB 14.00%
f01390330 Xi’an, Shaanxi, CN 1.31 TiB 3.50% 1.19 TiB 9.52%
f047419 North Prairie, Wisconsin, US 896.00 GiB 2.33% 896.00 GiB 0.00%
f0836160 Seoul, Seoul, KR 896.00 GiB 2.33% 800.00 GiB 10.71%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
32.00 GiB 64.00 GiB 1 0.17%
208.00 GiB 448.00 GiB 2 1.17%
612.00 GiB 2.02 TiB 3 5.38%
1.45 TiB 6.34 TiB 4 16.92%
1.59 TiB 8.41 TiB 5 22.43%
896.00 GiB 5.63 TiB 6 15.01%
224.00 GiB 1.75 TiB 7 4.67%
64.00 GiB 512.00 GiB 8 1.33%
32.00 GiB 288.00 GiB 9 0.75%
96.00 GiB 1.56 TiB 12 4.17%
224.00 GiB 3.81 TiB 13 10.17%
256.00 GiB 4.88 TiB 14 13.01%
96.00 GiB 1.81 TiB 15 4.84%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

lvschouwen commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

Since this is the 2nd allocation, the following restrictions have been relaxed:

⚠️ f01981571 has unknown IP location.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f02017390 Shanghai, Shanghai, CN
China Telecom (Group)
11.00 TiB 18.41% 11.00 TiB 0.00%
f0240456 Chengdu, Sichuan, CN
CHINA UNICOM China169 Backbone
1.85 TiB 3.09% 1.82 TiB 1.69%
f01390330 Xi’an, Shaanxi, CN
CHINANET-BACKBONE
1.31 TiB 2.20% 1.19 TiB 9.52%
f01970622 Hong Kong, Central and Western, HK
UCLOUD INFORMATION TECHNOLOGY (HK) LIMITED
6.25 TiB 10.46% 6.25 TiB 0.00%
f01981571 Unknown
Unknown
5.00 TiB 8.37% 5.00 TiB 0.00%
f047419 North Prairie, Wisconsin, US
Charter Communications Inc
896.00 GiB 1.46% 896.00 GiB 0.00%
f01072221 Los Angeles, California, US
Cyxtera Technologies Inc
2.72 TiB 4.55% 2.25 TiB 17.24%
f03624 Nürnberg, Bavaria, DE
Deutsche Telekom AG
3.23 TiB 5.41% 2.80 TiB 13.53%
f01896422 Fremont, California, US
Hurricane Electric LLC
2.69 TiB 4.50% 2.25 TiB 16.28%
f08399 Seattle, Washington, US
Isomedia, Inc.
1.56 TiB 2.62% 1.34 TiB 14.00%
f01871352 Seoul, Seoul, KR
Korea Telecom
2.54 TiB 4.25% 2.23 TiB 12.31%
f0836160 Seoul, Seoul, KR
Korea Telecom
896.00 GiB 1.46% 800.00 GiB 10.71%
f0717969 Los Angeles, California, US
Krypt Technologies
3.51 TiB 5.88% 3.07 TiB 12.46%
f01222595 Moscow, Moscow, RU
MTS PJSC
3.45 TiB 5.78% 2.89 TiB 16.29%
f0187709 Moscow, Moscow, RU
MTS PJSC
3.01 TiB 5.04% 2.50 TiB 16.75%
f01163272 Perm, Perm Krai, RU
PJSC Rostelecom
2.91 TiB 4.87% 2.38 TiB 18.28%
f01402814 Singapore, Singapore, SG
StarHub Ltd
1.79 TiB 3.00% 1.51 TiB 15.72%
f010088 Kirkland, Washington, US
Wholesail networks LLC
2.19 TiB 3.66% 2.00 TiB 8.57%
f01886797 Vancouver, British Columbia, CA
Zayo Bandwidth
2.98 TiB 4.98% 2.54 TiB 14.70%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

Since this is the 2nd allocation, the following restrictions have been relaxed:

✔️ Data replication looks healthy.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
12.28 TiB 12.31 TiB 1 20.61%
5.20 TiB 10.44 TiB 2 17.47%
612.00 GiB 2.02 TiB 3 3.37%
1.45 TiB 6.34 TiB 4 10.62%
1.59 TiB 8.41 TiB 5 14.07%
896.00 GiB 5.63 TiB 6 9.42%
224.00 GiB 1.75 TiB 7 2.93%
64.00 GiB 512.00 GiB 8 0.84%
32.00 GiB 288.00 GiB 9 0.47%
96.00 GiB 1.56 TiB 12 2.62%
224.00 GiB 3.81 TiB 13 6.38%
256.00 GiB 4.88 TiB 14 8.16%
96.00 GiB 1.81 TiB 15 3.03%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

Normalnoise commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers have unknown IP location - f01981571

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

Normalnoise commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers have unknown IP location - f01981571

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

Normalnoise commented 1 year ago

checker:manualTrigge

hengdingy commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

DataCap allocation requested

100TiB

Id

f84c4975-36ef-4872-884a-3e9c3f0fec5d

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Last two approvers

IreneYoung & kernelogic

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

100TiB

Total DataCap granted for client so far

285TiB

Datacap to be granted to reach the total amount requested by the client (3 PiB)

2.72PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
8871 32 50TiB 15.18 11.33TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

cryptowhizzard commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebunjert64choxycnurkcwdiyt4zjkjx3xf5jn63fds2uqzvoqg2g

Address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Datacap Allocated

50.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

f84c4975-36ef-4872-884a-3e9c3f0fec5d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebunjert64choxycnurkcwdiyt4zjkjx3xf5jn63fds2uqzvoqg2g

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

DataCap allocation requested

200TiB

Id

c876ce4d-d123-4439-b444-77a0ada17959

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Last two approvers

cryptowhizzard & IreneYoung

Rule to calculate the allocation request amount

200% of weekly dc amount requested

DataCap allocation requested

200TiB

Total DataCap granted for client so far

285TiB

Datacap to be granted to reach the total amount requested by the client (3 PiB)

2.72PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
8978 33 100TiB 14.99 8.54TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

nj-steve commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

nj-steve commented 1 year ago

@hengdingy Looks good! Do you have repaired the snapshots data for the next stage?

hengdingy commented 1 year ago

@hengdingy Looks good! Do you have repaired the snapshots data for the next stage?

@nj-steve we have stored the data schema in the repo, everybody can restore the data to the original snapshot.

nj-steve commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceaaehk2eyz7vdnvnej26pc6a3n6bmac26urfquh4c5na6ha73zo66

Address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Datacap Allocated

200.00TiB

Signer Address

f1xx6555qijma7igpnjspyvdunc4vfxkawnpqy5ii

Id

c876ce4d-d123-4439-b444-77a0ada17959

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaaehk2eyz7vdnvnej26pc6a3n6bmac26urfquh4c5na6ha73zo66

Bitrise0111 commented 1 year ago

The checker report and retrieval are both healthy.

Bitrise0111 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecvkgdltim5kvldrygep36d4jehiemkgx74znubmhrbgyismpr3mk

Address

f1x4jjrsot2gevrxiqwgzgjh7kzh6c6kv3kkuyv6a

Datacap Allocated

200.00TiB

Signer Address

f1nknj7ayq4o43czrtdoauggtwl43fbqatmqis3yy

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecvkgdltim5kvldrygep36d4jehiemkgx74znubmhrbgyismpr3mk