filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Kernelogic - End of Term Web Archive Dataset (2/2) #1684

Closed kernelogic closed 6 months ago

kernelogic commented 1 year ago

Data Owner Name

End of Term Web Archive

Data Owner Country/Region

United States

Data Owner Industry

Government

Website

https://registry.opendata.aws/eot-web-archive/

Social Media

N/A

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Custom multisig

Identifier

No response

Share a brief history of your project and organization

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

Even though Slingshot v2 has ended, there are still strong demand from SPs to onboard useful data. This application is to onboard open dataset from AWS.

I have a web UI (https://singularity-browser.kernelogic.ca/) to index all files onboarded and provide ways to retrieve.

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

Some of the recent LDNs I completed:
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1108
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1107
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1106
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1104
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/983

Is this project associated with other projects/ecosystem stakeholders?

Yes

If answered yes, what are the other projects/ecosystem stakeholders

Storage working groups, BigD exchange, singularity deal making tool.

Describe the data being stored onto Filecoin

Disclaimer: 
Due to un-answered issues around whether combined requests or duplicate requests can be used to apply LDN. This is a series of recent new open datasets never applied by anybody (aka calling dibs).

Description: 
The End of Term Web Archive (EOT) captures and saves U.S. Government websites at the end of presidential administrations. The EOT has thus far preserved websites from administration changes in 2008, 2012, 2016, and 2020. Data from these web crawls have been made openly available in several formats in this dataset.

Size:
Total files 1873606
Total size 580.1 TiB
s3://eotarchive

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/eot-web-archive/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose storage providers

Slack, Big data exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

PIKNIK f01904630,f01873432
GreaterHeat f01971600,f01992630
HarryM-Filet f02301,f03223,f0240185
BEWELL TECHNOLOGIES LIMITED f01944744,f01943663,f01928097
And others from BigDExchange

How do you plan to make deals to your storage providers

Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

DataCap allocation requested

256TiB

Id

c0ffad15-6f92-47da-b8b5-76cdcd051306

Tom-OriginStorage commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceczgd6isefhu2urenpthq4aphcpkxaxaz6powygeh2q5zpdeokygc

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

256.00TiB

Signer Address

f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga

Id

c0ffad15-6f92-47da-b8b5-76cdcd051306

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceczgd6isefhu2urenpthq4aphcpkxaxaz6powygeh2q5zpdeokygc

newwebgroup commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea5yy6cpm54p25pnt4x7l6lkja6s4i7bh6c23plnkf4umtbliil2y

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

256.00TiB

Signer Address

f1e77zuityhvvw6u2t6tb5qlnsegy2s67qs4lbbbq

Id

c0ffad15-6f92-47da-b8b5-76cdcd051306

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea5yy6cpm54p25pnt4x7l6lkja6s4i7bh6c23plnkf4umtbliil2y

kernelogic commented 1 year ago

keepalive

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

kernelogic commented 11 months ago

I need to keep this open.

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

kernelogic commented 11 months ago

Need to keep this open. Still onboarding slowly.

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

-- Commented by Stale Bot.

large-datacap-requests[bot] commented 10 months ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

DataCap allocation requested

512TiB

Id

8a6fa687-149f-4d86-bca5-d49bc2d48291

laurarenpanda commented 10 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedizhl2b5qtjnc2uer4rswtyv7yy6kf4txjwnfokw3jhd7w4qks2m

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

512.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

8a6fa687-149f-4d86-bca5-d49bc2d48291

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedizhl2b5qtjnc2uer4rswtyv7yy6kf4txjwnfokw3jhd7w4qks2m

Normalnoise commented 10 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

Normalnoise commented 10 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedbygs4e2dfhrjjtql2kte7cvsx35t7ylovdihjif3jp234uixz62

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

512.00TiB

Signer Address

f1c5non5yf35avgcpsqvxu4yj54yyvxorwyjochqq

Id

8a6fa687-149f-4d86-bca5-d49bc2d48291

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedbygs4e2dfhrjjtql2kte7cvsx35t7ylovdihjif3jp234uixz62

github-actions[bot] commented 9 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 9 months ago

Still onboarding in this series.

SuperChaiChai commented 9 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecpxhrvkfwimay2auys2mnl3w7gux6muwb4ptr2r7e4jxcyk4kqmc

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

512.00TiB

Signer Address

f12mckci3omexgzoeosjvstcfxfe4vqw7owdia3da

Id

8a6fa687-149f-4d86-bca5-d49bc2d48291

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecpxhrvkfwimay2auys2mnl3w7gux6muwb4ptr2r7e4jxcyk4kqmc

laurarenpanda commented 9 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebcku4yptau4jjgwdcochfdefry7t7gjgulrcxqcmooz4jklydary

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

512.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

8a6fa687-149f-4d86-bca5-d49bc2d48291

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebcku4yptau4jjgwdcochfdefry7t7gjgulrcxqcmooz4jklydary

github-actions[bot] commented 9 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 9 months ago

need to keep open, still onboarding.

Sunnyiscoming commented 8 months ago

Hello, @kernelogic per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

kernelogic commented 8 months ago

Hi @Sunnyiscoming form submitted. What's next? Can notaries proceed to sign now? If not, when can I expect a response?

large-datacap-requests[bot] commented 8 months ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

DataCap allocation requested

1PiB

Id

593e5502-9b88-4807-8b20-a7d36ed3f3a8

kernelogic commented 8 months ago

See CID report here https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1683#issuecomment-1783228984

AlanGreaterheat commented 8 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 8 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 60.95% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

AlanGreaterheat commented 8 months ago

checker:manualTrigger f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

filplus-checker-app[bot] commented 8 months ago

DataCap and CID Checker Report Summary[^1]

Other Addresses[^2]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 54.13% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

AlanGreaterheat commented 8 months ago

Client is well known and has completed KYC form. Willing to support this round.

kernelogic commented 8 months ago

Thanks, more replication is coming soon. These are only early tranches.

AlanGreaterheat commented 8 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebabo2moh6rsup7e5dqmtszqvvfp37kjf2y2iihjcijo554xvzugk

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

1.00PiB

Signer Address

f1pnmzlxj7cfeo2v6oj5nco46hkg2l46wj7o4xxui

Id

593e5502-9b88-4807-8b20-a7d36ed3f3a8

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebabo2moh6rsup7e5dqmtszqvvfp37kjf2y2iihjcijo554xvzugk

luobin544 commented 8 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebp6wzu45dzf6bclqauovugerbhorgnf7cptpowqnukabr2lu7yes

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

1.00PiB

Signer Address

f1tbd632f6w62glfaf7wjpimacbnjiz26poyoes2q

Id

593e5502-9b88-4807-8b20-a7d36ed3f3a8

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebp6wzu45dzf6bclqauovugerbhorgnf7cptpowqnukabr2lu7yes

nj-steve commented 8 months ago

also as 1683

nj-steve commented 8 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebipwmf6ddeoz5mv5js6qx2hceew3poltsb3oznlciqql7nkxwidu

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

1.00PiB

Signer Address

f1xx6555qijma7igpnjspyvdunc4vfxkawnpqy5ii

Id

593e5502-9b88-4807-8b20-a7d36ed3f3a8

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebipwmf6ddeoz5mv5js6qx2hceew3poltsb3oznlciqql7nkxwidu

Sunnyiscoming commented 8 months ago

Please add contact info of all sps in the form. @kernelogic

kernelogic commented 8 months ago

I have resubmitted the KYC form with contact info. Thanks. #1683 shares the same KYC info.

github-actions[bot] commented 8 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 8 months ago

keepalive

dannyob commented 7 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedgxy2hwjhj67kchzle3hewj3mskav7h5kgt5cx7ryimaxn7e2gbg

Address

f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

Datacap Allocated

1.00PiB

Signer Address

f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a

Id

593e5502-9b88-4807-8b20-a7d36ed3f3a8

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedgxy2hwjhj67kchzle3hewj3mskav7h5kgt5cx7ryimaxn7e2gbg

github-actions[bot] commented 7 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 7 months ago

keep alive

Sunnyiscoming commented 7 months ago

Please provide ID, City, Country, Organization of each SP here.

github-actions[bot] commented 6 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.