filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Kernelogic - End of Term Web Archive Dataset (1/2) #1683

Closed kernelogic closed 6 months ago

kernelogic commented 1 year ago

Data Owner Name

End of Term Web Archive

Data Owner Country/Region

United States

Data Owner Industry

Government

Website

https://registry.opendata.aws/eot-web-archive/

Social Media

N/A

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Custom multisig

Identifier

No response

Share a brief history of your project and organization

I have participated every Slingshot phase and is probably the best performing as a "small individual client". 

Even though Slingshot v2 has ended, there are still strong demand from SPs to onboard useful data. This application is to onboard open dataset from AWS.

I have a web UI (https://singularity-browser.kernelogic.ca/) to index all files onboarded and provide ways to retrieve.

I have successfully completed a few LDNs on other datasets and I have record to show I have been following the rules of decentralization and have zero self dealing.

Some of the recent LDNs I completed:
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1108
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1107
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1106
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1104
https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/983

Is this project associated with other projects/ecosystem stakeholders?

Yes

If answered yes, what are the other projects/ecosystem stakeholders

Storage working groups, BigD exchange, singularity deal making tool.

Describe the data being stored onto Filecoin

Disclaimer: 
Due to un-answered issues around whether combined requests or duplicate requests can be used to apply LDN. This is a series of recent new open datasets never applied by anybody (aka calling dibs).

Description: 
The End of Term Web Archive (EOT) captures and saves U.S. Government websites at the end of presidential administrations. The EOT has thus far preserved websites from administration changes in 2008, 2012, 2016, and 2020. Data from these web crawls have been made openly available in several formats in this dataset.

Size:
Total files 1873606
Total size 580.1 TiB
s3://eotarchive

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/eot-web-archive/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe

How will you be distributing your data to storage providers

HTTP or FTP server

How do you plan to choose storage providers

Slack, Big data exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

PIKNIK f01904630,f01873432
GreaterHeat f01971600,f01992630
HarryM-Filet f02301,f03223,f0240185
BEWELL TECHNOLOGIES LIMITED f01944744,f01943663,f01928097
And others from BigDExchange

How do you plan to make deals to your storage providers

No response

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

1PiB

Client address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01858410

Client address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

DataCap allocation requested

256TiB

Id

3c60b1f7-397d-498c-be10-61c99a75b05f

Tom-OriginStorage commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceaduyzzpbg3rblw23rkcppfzxrhk7mu3wszck53vttxl4fi7rgp4o

Address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Datacap Allocated

256.00TiB

Signer Address

f1q6bpjlqia6iemqbrdaxr2uehrhpvoju3qh4lpga

Id

3c60b1f7-397d-498c-be10-61c99a75b05f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaduyzzpbg3rblw23rkcppfzxrhk7mu3wszck53vttxl4fi7rgp4o

xiaoyuaiheshui commented 1 year ago

First assignment Kernelogic project is as compliant as ever. I am willing to support and look forward to the next milestone!

xiaoyuaiheshui commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceabif4d2e4hgxfvp4ae5ucgdr3slazk7w7pn4rl7oxcwfu4334av4

Address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Datacap Allocated

256.00TiB

Signer Address

f122qmy25wdtt5mxd77kndiq7z5x2n3iwiuz2wdsa

Id

3c60b1f7-397d-498c-be10-61c99a75b05f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceabif4d2e4hgxfvp4ae5ucgdr3slazk7w7pn4rl7oxcwfu4334av4

kernelogic commented 1 year ago

keepalive

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

kernelogic commented 1 year ago

I need to keep this open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

kernelogic commented 1 year ago

Need to keep this open. Still onboarding slowly.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 1 year ago

I am still working on it. Started sending out deals this week.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 12 months ago

I am still working on it. I sent out some deals already but just need a bit more distribution to trigger next tranche.

large-datacap-requests[bot] commented 12 months ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

DataCap allocation requested

512TiB

Id

7124a59c-195f-4d54-b804-648185997276

kernelogic commented 12 months ago

checker:manualTrigger f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

filplus-checker-app[bot] commented 12 months ago

DataCap and CID Checker Report Summary[^1]

Other Addresses[^2]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

a1991car commented 12 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaced7mlnwfftkuvb5bpc27emjw34f2lzve65amej32522ag557dyl4c

Address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Datacap Allocated

512.00TiB

Signer Address

f1qnumecdypgrbaebtkdfjnwt5ndacadcuas3deiq

Id

7124a59c-195f-4d54-b804-648185997276

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced7mlnwfftkuvb5bpc27emjw34f2lzve65amej32522ag557dyl4c

sxxfuture-official commented 12 months ago

LGTM

sxxfuture-official commented 12 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebjuqsnntkehjonuqup7y4depg4ednhvozcilh4552voopofpdfec

Address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Datacap Allocated

512.00TiB

Signer Address

f1foiomqlmoshpuxm6aie4xysffqezkjnokgwcecq

Id

7124a59c-195f-4d54-b804-648185997276

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebjuqsnntkehjonuqup7y4depg4ednhvozcilh4552voopofpdfec

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 11 months ago

Still onboarding in this series.

SuperChaiChai commented 11 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceahi7czu5rrppbkfyqipwaigfi4274p6q4b4jxq6krf63rpfrjohu

Address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Datacap Allocated

512.00TiB

Signer Address

f12mckci3omexgzoeosjvstcfxfe4vqw7owdia3da

Id

7124a59c-195f-4d54-b804-648185997276

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceahi7czu5rrppbkfyqipwaigfi4274p6q4b4jxq6krf63rpfrjohu

laurarenpanda commented 11 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebo4567nb5orbncxrw7r3cy3lzzeyy3x2hhrdn36e5jljxa6zcqfm

Address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Datacap Allocated

512.00TiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

7124a59c-195f-4d54-b804-648185997276

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebo4567nb5orbncxrw7r3cy3lzzeyy3x2hhrdn36e5jljxa6zcqfm

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

kernelogic commented 10 months ago

need to keep open, still onboarding.

large-datacap-requests[bot] commented 10 months ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

DataCap allocation requested

1PiB

Id

37f3b264-7558-4a02-8ce8-94295913b4e4

kevzak commented 10 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 73.47% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

kevzak commented 10 months ago

@kernelogic looks like 2 SPs and 1 is storing 60+%. Who are distributed SPs involved. please provide Entity and IDs.

kernelogic commented 10 months ago

checker:manualTrigger f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Other Addresses[^2]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

Sunnyiscoming commented 10 months ago

Hello, @kernelogic per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

kernelogic commented 10 months ago

Hi @Sunnyiscoming form submitted. What's next? Can notaries proceed to sign now? If not, when can I expect a response?

AlanGreaterheat commented 10 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 84.04% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

AlanGreaterheat commented 10 months ago

checker:manualTrigger f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq f1ezp4w6l2y2oz2nmy4sdegnmxcrphab2jlt5hbla

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Other Addresses[^2]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 54.13% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

AlanGreaterheat commented 10 months ago

Client is well known and has completed KYC form. Willing to support this round.

AlanGreaterheat commented 10 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacec7umxtmxb4wwfi5szensq72nfuv3djzue4m4bzkizaam6bjzgiyg

Address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Datacap Allocated

1.00PiB

Signer Address

f1pnmzlxj7cfeo2v6oj5nco46hkg2l46wj7o4xxui

Id

37f3b264-7558-4a02-8ce8-94295913b4e4

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec7umxtmxb4wwfi5szensq72nfuv3djzue4m4bzkizaam6bjzgiyg

luobin544 commented 10 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebryddggpsfcuvluahhm42fdyri4mt3lrhbwe4wbk3t6yyjwp5zno

Address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Datacap Allocated

1.00PiB

Signer Address

f1tbd632f6w62glfaf7wjpimacbnjiz26poyoes2q

Id

37f3b264-7558-4a02-8ce8-94295913b4e4

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebryddggpsfcuvluahhm42fdyri4mt3lrhbwe4wbk3t6yyjwp5zno

nj-steve commented 10 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 84.04% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

nj-steve commented 10 months ago

Because at first,more replication is coming soon.

nj-steve commented 10 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceadvnmijiyganxgaeteixfiqg6diwwnd2o2tgjzpbrety2xffcbmw

Address

f16xgxvxqh4uly64npes2deyyt43mfynaaemrtkfq

Datacap Allocated

1.00PiB

Signer Address

f1xx6555qijma7igpnjspyvdunc4vfxkawnpqy5ii

Id

37f3b264-7558-4a02-8ce8-94295913b4e4

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceadvnmijiyganxgaeteixfiqg6diwwnd2o2tgjzpbrety2xffcbmw

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.