filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] <FogMeta Lab> - <End of Term Web Archive Datasets> #1600

Open hengdingy opened 1 year ago

hengdingy commented 1 year ago

Data Owner Name

FogMeta Lab

Data Owner Country/Region

China

Data Owner Industry

Web3 / Crypto

Website

https://fogmeta.com

Social Media

Twitter: https://twitter.com/FogMeta
GitHub: https://github.com/FogMeta

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1o6rcx5wky2qy54kd6l6l5zj36uq7ahhl2dt7xba

Custom multisig

Identifier

No response

Share a brief history of your project and organization

FogMeta Lab's research spans multiple levels from system technology, infrastructure, and middleware to services and solutions, and involves future systems, network technology and business, distributed systems and management, information management, and interactive and innovative services. Based on the views on and practices in the industry, FogMeta also solves the problem of business complexity through operations optimization and other technologies.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

"The End of Term Web Archive (EOT) captures and saves U.S. Government websites at the end of presidential administrations. The EOT has thus far preserved websites from administration changes in 2008, 2012, 2016, and 2020. Data from these web crawls have been made openly available in several formats in this dataset."

Source: https://registry.opendata.aws/eot-web-archive/

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, graphsplit, others/custom tool

If you answered "other/custom tool" in the previous question, enter the details here

We also like to use the Swan Client tool (https://github.com/filswan/go-swan-client#Graphsplit) to prepare the dataset.

Please share a sample of the data

https://eotarchive.org/data/data-2008/
https://eotarchive.org/data/data-2012/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Monthly

For how long do you plan to keep this dataset stored on Filecoin

2 to 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, Africa, North America, South America, Europe, Australia (continent), Antarctica

How will you be distributing your data to storage providers

Cloud storage (i.e. S3), HTTP or FTP server, IPFS, Shipping hard drives, Others

How do you plan to choose storage providers

Slack, Partners, Others

If you answered "Others" in the previous question, what is the tool or platform you plan to use

FilSwan platform (https://filswan.com/) is another good choice for us to choose storage providers who meet our requirements.

If you already have a list of storage providers to work with, fill out their names and provider IDs below

The storage providers we'd like to work with are presented below. Some of them are from the FilSwan platform.
f03624
f010088
f02301
f08399
f02401
f0187709
f01163272
f01402814
f01072221
f0240185
f0143858
f01390330
f01225882
f0717969
f03223
f01395673
f01786736
f0836160
f032824
f01443744
f01871352
f01907556
f01946551
f02012951
f01970630

How do you plan to make deals to your storage providers

Boost client, Lotus client, Others/custom tool

If you answered "Others/custom tool" in the previous question, enter the details here

https://github.com/filswan/go-swan-client

Can you confirm that you will follow the Fil+ guideline

Yes

hengdingy commented 9 months ago

manualTrigger

Normalnoise commented 8 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 8 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.