filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] Internet Archive #52

Closed galen-mcandrew closed 10 months ago

galen-mcandrew commented 3 years ago

Large Dataset Notary Application

To apply for a DataCap allocation for your dataset, please fill out the following information.

Core Information

Please respond to the questions below in pargraph form, replacing the text saying "Please answer here". Include as much detail as you can in your answer!

Project details

Share a brief history of your project and organization.

The Internet Archive, a 501(c)(3) non-profit, is building a digital library of Internet sites and other cultural artifacts in digital form. Like a physical library, we provide free access to researchers, historians, scholars, the print disabled, and the general public. Our mission is to provide Universal Access to All Knowledge. See more at https://archive.org/about/

This project aims to explore the role of decentralized storage in this long-term mission.

What is the primary source of funding for this project?

We are funded through donations, grants, and by providing web archiving and book digitization services for our partners. 

What other projects/ecosystem stakeholders is this project associated with?

The dataset was compiled in collaboration with The Library of Congress, California Digital Library, University of North Texas Libraries, Internet Archive, George Washington University Libraries, Stanford University Libraries, and the U.S. Government Publishing Office.

Use-case details

Describe the data being stored onto Filecoin

The End-of-Term Web Archive captures and saves U.S. Government websites at the end of presidential administrations. This dataset represents a comprehensive crawl of the .gov domain September 2016 and January 20, 2017, at the end of the Obama Administration and just before the beginning of the Trump Administration.

Where was the data in this dataset sourced from?

Federal Government websites (.gov) in the Legislative, Executive, or Judicial branches of government, and related social media accounts. Also in scope are Federal Government Websites on other domains, such as .mil, .edu, and .com

Can you share a sample of what is in the dataset? A link to a file, an image, a table, etc., are good examples of this.

The dataset contains WARC files containing crawl data (and associated metadata) of the aforementioned sites. Their contents, when opened with a compatible viewer, are similar to https://web.archive.org/web/20170126033350/http:/globalchange.epa.gov/

The raw files look like this: https://archive.org/download/LOC-QUARTERLY-006-20161225070227072-13019-13025-wbgrp-crawl202

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, data is archived in the public interest. Archive is currently available at http://eotarchive.cdlib.org/search?f1-administration=2016

What is the expected retrieval frequency for this data?

This effort is intended primarily as an archival and exploratory usecase. Data may be accessed by researchers, periodic integrity checks, and interactive use prototypes (similar to Estuary)

For how long do you plan to keep this dataset stored on Filecoin? Will this be a permanent archival or a one-time storage deal?

The dataset is intended for long-term archival storage, depending on the outcomes of this trial.

DataCap allocation plan

In which geographies do you plan on making storage deals?

We're looking for a wide geographic distribution to model global resiliency. Miners in NA and EU geos will initially be considered.

What is your expected data onboarding rate? How many deals can you make in a day, in a week? How much DataCap do you plan on using per day, per week?

We have extensive interconnects to high bandwidth networks and robust processing capacity. Once we get through the testing phase, we expect us to be able to onboard between 50-100TiB/week.

How will you be distributing your data to miners? Is there an offline data transfer process?

Offline data transfer over the internet, using standard HTTP or purose-made protocol like Tachyon.

How do you plan on choosing the miners with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

Miners that are in the right geographies and have high reputation scores on public indices like filrep.io. The initial set of storage providers for testing will likely be from the MinerX Fellowship.

How will you be distributing data and DataCap across miners storing data?

We will likely be structuring our files into 32GiB chunks that will be evenly distributed in deals with the selected set of storage providers.
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f020378 has unknown IP location.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01611097 Mission Viejo, California, US
AT&T Services, Inc.
73.78 TiB 24.91% 71.66 TiB 2.87%
f058369 Boston, Massachusetts, US
Charles River Operation
32.00 GiB 0.01% 32.00 GiB 0.00%
f01910202 Philadelphia, Pennsylvania, US
Cogent Communications
56.33 TiB 19.02% 55.13 TiB 2.12%
f01826669 Philadelphia, Pennsylvania, US
Cogent Communications
18.29 TiB 6.17% 18.29 TiB 0.00%
f01883179new Philadelphia, Pennsylvania, US
Cogent Communications
7.69 TiB 2.60% 7.69 TiB 0.00%
f010446 Zaventem, Flanders, BE
Cogent Communications
16.00 GiB 0.01% 16.00 GiB 0.00%
f01345523 Antwerpen, Flanders, BE
Cogent Communications
2.00 GiB 0.00% 2.00 GiB 0.00%
f01858429 Boston, Massachusetts, US
Comcast Cable Communications, LLC
16.60 TiB 5.60% 16.42 TiB 1.04%
f09848 Rancho Santa Margarita, California, US
Cox Communications Inc.
48.00 GiB 0.02% 48.00 GiB 0.00%
f01606675 Montréal, Quebec, CA
eStruxture Data Centers Inc.
3.30 TiB 1.12% 3.30 TiB 0.00%
f01091840 Montréal, Quebec, CA
eStruxture Data Centers Inc.
176.00 GiB 0.06% 160.00 GiB 9.09%
f0157535 Montréal, Quebec, CA
eStruxture Data Centers Inc.
80.00 GiB 0.03% 80.00 GiB 0.00%
f019104 Montréal, Quebec, CA
eStruxture Data Centers Inc.
53.00 GiB 0.02% 53.00 GiB 0.00%
f0165400 Montréal, Quebec, CA
eStruxture Data Centers Inc.
48.00 GiB 0.02% 48.00 GiB 0.00%
f0104671 Kawasaki, Kanagawa, JP
KDDI CORPORATION
64.00 GiB 0.02% 64.00 GiB 0.00%
f030379 Gangneung, Gangwon-do, KR
LG DACOM Corporation
16.00 GiB 0.01% 16.00 GiB 0.00%
f024184 Seoul, Seoul, KR
LG DACOM Corporation
2.00 GiB 0.00% 2.00 GiB 0.00%
f0694396 Birmingham, England, GB
Neonix Web Services Limited
17.00 GiB 0.01% 17.00 GiB 0.00%
f019551 Birmingham, England, GB
Neonix Web Services Limited
16.00 GiB 0.01% 16.00 GiB 0.00%
f01784458 Oslo, Oslo, NO
Nexthop AS
1.00 GiB 0.00% 1.00 GiB 0.00%
f01882184 Herndon, Virginia, US
PCCW Global, Inc.
18.81 TiB 6.35% 18.81 TiB 0.00%
f01873432 Las Vegas, Nevada, US
PiKNiK & Company Inc.
39.36 TiB 13.29% 37.27 TiB 5.31%
f01904630 Las Vegas, Nevada, US
PiKNiK & Company Inc.
37.47 TiB 12.65% 34.16 TiB 8.83%
f01851060 Las Vegas, Nevada, US
PiKNiK & Company Inc.
11.96 TiB 4.04% 11.82 TiB 1.18%
f01851683new Las Vegas, Nevada, US
PiKNiK & Company Inc.
11.52 TiB 3.89% 11.52 TiB 0.00%
f066596 San Diego, California, US
ScaleMatrix
274.00 GiB 0.09% 274.00 GiB 0.00%
f02576 Copenhagen, Capital Region, DK
Telenor A/S
96.00 GiB 0.03% 96.00 GiB 0.00%
f010088 Everett, Washington, US
Wholesail networks LLC
18.00 GiB 0.01% 18.00 GiB 0.00%
f01199442 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
146.00 GiB 0.05% 130.00 GiB 10.96%
f01207045 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
32.00 GiB 0.01% 32.00 GiB 0.00%
f01199430 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
3.00 GiB 0.00% 3.00 GiB 0.00%
f020378 Unknown
Unknown
2.00 GiB 0.00% 2.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 84.06% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
26.38 TiB 26.40 TiB 1 8.91%
31.92 TiB 64.12 TiB 2 21.64%
51.83 TiB 158.49 TiB 3 53.50%
9.95 TiB 45.28 TiB 4 15.28%
340.00 GiB 1.95 TiB 5 0.66%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f1sw5zjcyo4mff5cbvgsgmm7uoko6gcr4tptvtkhy Glif auto verified 208.00 GiB 1 Unknown
f3wkp4blevjsrtbc6vwgjf2sedzjwsqmj3wsh4uex
bp4k7dggs72kbvuv7xivsnz7cnmfazpmqp3qmchmz
ms6a
Unknown 208.00 GiB 1 Unknown
f3u5dehxxe2uvehitioxhwjp27wpv72hsnuqhtz6s
ce2wzqv2skhguivnsvwbkwgczcc5x4qf6eeao34te
jqdq
Glif auto verified 16.00 GiB 1 Unknown

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebivj2n6kkle2gb2vslp5o6u6lphnawxim6s6u4usw7424qpq2rnm

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

800.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

74e1ec38-3b34-4956-9799-a7c6e03fad65

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebivj2n6kkle2gb2vslp5o6u6lphnawxim6s6u4usw7424qpq2rnm

BDEio commented 1 year ago

@galen-mcandrew Hi! Congratulations on your DataCap approval! BDE is a trusted deals auction house helping you to get paid storing your data with reliable storage providers. If you need any help, please get in touch.

parkan commented 1 year ago

please checker:manualTrigger

EDIT: hmm maybe I don't have permission to trigger it?

parkan commented 1 year ago

@filplus-checker checker:manualTrigger

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecnhx6e2vaw33wc3w2wfutiet6l7kui3jwxowtcq34jubiiymr73g

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

800.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

74e1ec38-3b34-4956-9799-a7c6e03fad65

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecnhx6e2vaw33wc3w2wfutiet6l7kui3jwxowtcq34jubiiymr73g

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f02011071 has unknown IP location.

⚠️ f020378 has unknown IP location.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01611097 Mission Viejo, California, US
AT&T Services, Inc.
73.78 TiB 12.87% 71.66 TiB 2.87%
f058369 Boston, Massachusetts, US
Charles River Operation
32.00 GiB 0.01% 32.00 GiB 0.00%
f01910202 Philadelphia, Pennsylvania, US
Cogent Communications
82.42 TiB 14.38% 76.49 TiB 7.19%
f01826669 Philadelphia, Pennsylvania, US
Cogent Communications
18.29 TiB 3.19% 18.29 TiB 0.00%
f01883179new Philadelphia, Pennsylvania, US
Cogent Communications
7.69 TiB 1.34% 7.69 TiB 0.00%
f01886690 Arcadia, California, US
Cogent Communications
704.00 GiB 0.12% 688.00 GiB 2.27%
f010446 Zaventem, Flanders, BE
Cogent Communications
16.00 GiB 0.00% 16.00 GiB 0.00%
f01345523 Antwerpen, Flanders, BE
Cogent Communications
2.00 GiB 0.00% 2.00 GiB 0.00%
f01858429 Boston, Massachusetts, US
Comcast Cable Communications, LLC
25.29 TiB 4.41% 24.13 TiB 4.56%
f09848 Rancho Santa Margarita, California, US
Cox Communications Inc.
48.00 GiB 0.01% 48.00 GiB 0.00%
f019104 Indianapolis, Indiana, US
Eli Lilly and Company
53.00 GiB 0.01% 53.00 GiB 0.00%
f01606675 Montréal, Quebec, CA
eStruxture Data Centers Inc.
3.30 TiB 0.58% 3.30 TiB 0.00%
f01091840 Montréal, Quebec, CA
eStruxture Data Centers Inc.
176.00 GiB 0.03% 160.00 GiB 9.09%
f0157535 Montréal, Quebec, CA
eStruxture Data Centers Inc.
80.00 GiB 0.01% 80.00 GiB 0.00%
f0165400 Montréal, Quebec, CA
eStruxture Data Centers Inc.
48.00 GiB 0.01% 48.00 GiB 0.00%
f01985775 Dallas, Texas, US
Flexential Colorado Corp.
46.63 TiB 8.13% 46.63 TiB 0.00%
f01988794 Dallas, Texas, US
Flexential Colorado Corp.
46.37 TiB 8.09% 46.37 TiB 0.00%
f01985745 Dallas, Texas, US
Flexential Colorado Corp.
45.16 TiB 7.88% 45.16 TiB 0.00%
f0104671 Kawasaki, Kanagawa, JP
KDDI CORPORATION
64.00 GiB 0.01% 64.00 GiB 0.00%
f030379 Seoul, Seoul, KR
LG DACOM Corporation
16.00 GiB 0.00% 16.00 GiB 0.00%
f024184 Seoul, Seoul, KR
LG DACOM Corporation
2.00 GiB 0.00% 2.00 GiB 0.00%
f0694396 Birmingham, England, GB
Neonix Web Services Limited
17.00 GiB 0.00% 17.00 GiB 0.00%
f019551 Birmingham, England, GB
Neonix Web Services Limited
16.00 GiB 0.00% 16.00 GiB 0.00%
f01784458 Oslo, Oslo, NO
Nexthop AS
1.00 GiB 0.00% 1.00 GiB 0.00%
f01882184 Singapore, Singapore, SG
PCCW Global, Inc.
18.81 TiB 3.28% 18.81 TiB 0.00%
f01873432 Las Vegas, Nevada, US
PiKNiK & Company Inc.
69.98 TiB 12.21% 62.32 TiB 10.95%
f01904630 Las Vegas, Nevada, US
PiKNiK & Company Inc.
68.77 TiB 12.00% 60.12 TiB 12.58%
f01851060 Las Vegas, Nevada, US
PiKNiK & Company Inc.
42.30 TiB 7.38% 38.96 TiB 7.89%
f01851683new Las Vegas, Nevada, US
PiKNiK & Company Inc.
11.52 TiB 2.01% 11.52 TiB 0.00%
f01953925new Burlington, Vermont, US
Protocol Labs
11.14 TiB 1.94% 11.14 TiB 0.00%
f066596 San Diego, California, US
ScaleMatrix
274.00 GiB 0.05% 274.00 GiB 0.00%
f02576 Copenhagen, Capital Region, DK
Telenor A/S
96.00 GiB 0.02% 96.00 GiB 0.00%
f010088 Everett, Washington, US
Wholesail networks LLC
18.00 GiB 0.00% 18.00 GiB 0.00%
f01199442 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
146.00 GiB 0.02% 130.00 GiB 10.96%
f01207045 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
32.00 GiB 0.01% 32.00 GiB 0.00%
f02011071new Unknown
Unknown
32.00 GiB 0.01% 32.00 GiB 0.00%
f01199430 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
3.00 GiB 0.00% 3.00 GiB 0.00%
f020378 Unknown
Unknown
2.00 GiB 0.00% 2.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 47.14% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
87.30 TiB 87.32 TiB 1 15.23%
25.96 TiB 52.07 TiB 2 9.08%
42.96 TiB 130.86 TiB 3 22.83%
52.17 TiB 219.28 TiB 4 38.25%
8.55 TiB 54.71 TiB 5 9.54%
3.77 TiB 26.66 TiB 6 4.65%
304.00 GiB 2.25 TiB 7 0.39%
16.00 GiB 160.00 GiB 8 0.03%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f1sw5zjcyo4mff5cbvgsgmm7uoko6gcr4tptvtkhy Glif auto verified 208.00 GiB 1 Unknown
f3wkp4blevjsrtbc6vwgjf2sedzjwsqmj3wsh4uex
bp4k7dggs72kbvuv7xivsnz7cnmfazpmqp3qmchmz
ms6a
Unknown 208.00 GiB 1 Unknown
f3u5dehxxe2uvehitioxhwjp27wpv72hsnuqhtz6s
ce2wzqv2skhguivnsvwbkwgczcc5x4qf6eeao34te
jqdq
Glif auto verified 16.00 GiB 1 Unknown

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb3g2vm5crsjeettyvetuch4pu4ukodmhwce77w3qjlt2ymkjhbgo

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

800.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

74e1ec38-3b34-4956-9799-a7c6e03fad65

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb3g2vm5crsjeettyvetuch4pu4ukodmhwce77w3qjlt2ymkjhbgo

parkan commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f02011071 has unknown IP location.

⚠️ f020378 has unknown IP location.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01611097 Mission Viejo, California, US
AT&T Services, Inc.
73.78 TiB 10.99% 71.66 TiB 2.87%
f058369 Boston, Massachusetts, US
Charles River Operation
32.00 GiB 0.00% 32.00 GiB 0.00%
f01910202 Philadelphia, Pennsylvania, US
Cogent Communications
99.81 TiB 14.86% 90.45 TiB 9.38%
f01826669 Philadelphia, Pennsylvania, US
Cogent Communications
18.29 TiB 2.72% 18.29 TiB 0.00%
f01886690 Las Vegas, Nevada, US
Cogent Communications
10.84 TiB 1.61% 10.81 TiB 0.29%
f01883179new Philadelphia, Pennsylvania, US
Cogent Communications
7.69 TiB 1.14% 7.69 TiB 0.00%
f010446 Zaventem, Flanders, BE
Cogent Communications
16.00 GiB 0.00% 16.00 GiB 0.00%
f01345523 Antwerpen, Flanders, BE
Cogent Communications
2.00 GiB 0.00% 2.00 GiB 0.00%
f01858429 Boston, Massachusetts, US
Comcast Cable Communications, LLC
25.29 TiB 3.77% 24.13 TiB 4.56%
f09848 Rancho Santa Margarita, California, US
Cox Communications Inc.
48.00 GiB 0.01% 48.00 GiB 0.00%
f019104 Indianapolis, Indiana, US
Eli Lilly and Company
53.00 GiB 0.01% 53.00 GiB 0.00%
f01606675 Montréal, Quebec, CA
eStruxture Data Centers Inc.
3.30 TiB 0.49% 3.30 TiB 0.00%
f01091840 Montréal, Quebec, CA
eStruxture Data Centers Inc.
176.00 GiB 0.03% 160.00 GiB 9.09%
f0157535 Montréal, Quebec, CA
eStruxture Data Centers Inc.
80.00 GiB 0.01% 80.00 GiB 0.00%
f0165400 Montréal, Quebec, CA
eStruxture Data Centers Inc.
48.00 GiB 0.01% 48.00 GiB 0.00%
f01985775 Dallas, Texas, US
Flexential Colorado Corp.
46.63 TiB 6.94% 46.63 TiB 0.00%
f01988794 Dallas, Texas, US
Flexential Colorado Corp.
46.37 TiB 6.91% 46.37 TiB 0.00%
f01985745 Dallas, Texas, US
Flexential Colorado Corp.
45.16 TiB 6.72% 45.16 TiB 0.00%
f01882184 Singapore, Singapore, SG
Gateway Communications
18.81 TiB 2.80% 18.81 TiB 0.00%
f0104671 Kawasaki, Kanagawa, JP
KDDI CORPORATION
64.00 GiB 0.01% 64.00 GiB 0.00%
f030379 Seoul, Seoul, KR
LG DACOM Corporation
16.00 GiB 0.00% 16.00 GiB 0.00%
f024184 Daejeon, Daejeon, KR
LG DACOM Corporation
2.00 GiB 0.00% 2.00 GiB 0.00%
f0694396 Birmingham, England, GB
Neonix Web Services Limited
17.00 GiB 0.00% 17.00 GiB 0.00%
f019551 Birmingham, England, GB
Neonix Web Services Limited
16.00 GiB 0.00% 16.00 GiB 0.00%
f01784458 Oslo, Oslo, NO
Nexthop AS
1.00 GiB 0.00% 1.00 GiB 0.00%
f01873432 Las Vegas, Nevada, US
PiKNiK & Company Inc.
95.80 TiB 14.27% 83.46 TiB 12.88%
f01904630 Las Vegas, Nevada, US
PiKNiK & Company Inc.
77.36 TiB 11.52% 67.11 TiB 13.26%
f01851060 Las Vegas, Nevada, US
PiKNiK & Company Inc.
71.24 TiB 10.61% 65.42 TiB 8.16%
f01851683new Las Vegas, Nevada, US
PiKNiK & Company Inc.
11.52 TiB 1.71% 11.52 TiB 0.00%
f01953925new Burlington, Vermont, US
Protocol Labs
18.45 TiB 2.75% 18.40 TiB 0.25%
f066596 San Diego, California, US
ScaleMatrix
274.00 GiB 0.04% 274.00 GiB 0.00%
f02576 Copenhagen, Capital Region, DK
Telenor A/S
96.00 GiB 0.01% 96.00 GiB 0.00%
f010088 Kirkland, Washington, US
Wholesail networks LLC
18.00 GiB 0.00% 18.00 GiB 0.00%
f01199442 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
146.00 GiB 0.02% 130.00 GiB 10.96%
f01207045 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
32.00 GiB 0.00% 32.00 GiB 0.00%
f02011071new Unknown
Unknown
32.00 GiB 0.00% 32.00 GiB 0.00%
f01199430 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
3.00 GiB 0.00% 3.00 GiB 0.00%
f020378 Unknown
Unknown
2.00 GiB 0.00% 2.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 44.07% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
54.29 TiB 54.31 TiB 1 8.09%
30.83 TiB 61.84 TiB 2 9.21%
59.12 TiB 179.81 TiB 3 26.78%
55.07 TiB 234.39 TiB 4 34.91%
14.60 TiB 90.83 TiB 5 13.53%
6.68 TiB 46.30 TiB 6 6.90%
512.00 GiB 3.84 TiB 7 0.57%
16.00 GiB 160.00 GiB 8 0.02%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f1sw5zjcyo4mff5cbvgsgmm7uoko6gcr4tptvtkhy Glif auto verified 208.00 GiB 1 Unknown
f3wkp4blevjsrtbc6vwgjf2sedzjwsqmj3wsh4uex
bp4k7dggs72kbvuv7xivsnz7cnmfazpmqp3qmchmz
ms6a
Unknown 208.00 GiB 1 Unknown
f3u5dehxxe2uvehitioxhwjp27wpv72hsnuqhtz6s
ce2wzqv2skhguivnsvwbkwgczcc5x4qf6eeao34te
jqdq
Glif auto verified 16.00 GiB 1 Unknown

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

parkan commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers have unknown IP location - f02011071, f020378

Deal Data Replication

⚠️ 41.13% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

s0nik42 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceaakmunauzar2dbouf6nftqoctiemviclhznsjah6i2nefvzcyor6

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

800.00TiB

Signer Address

f1wxhnytjmklj2czezaqcfl7eb4nkgmaxysnegwii

Id

74e1ec38-3b34-4956-9799-a7c6e03fad65

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaakmunauzar2dbouf6nftqoctiemviclhznsjah6i2nefvzcyor6

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

parkan commented 1 year ago

@galen-mcandrew @fabriziogianni7 we've run out of the current allocation and still have more data to replicate, as well as willing SPs to take it, could we reopen the issue/allocate the next batch?

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 5

Multisig Notary address

f02049625

Client address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

DataCap allocation requested

400TiB

Id

925d1b6a-c4e4-49cc-809a-277fb4f04795

parkan commented 1 year ago

Progress update: onboarding has been proceeding OK since spade spinup in June, with some SP churn and periods of inactivity but overall making progress and crossing 1.5PiB uniques and peak daily onboarding rates north of 20TiB/day. We are now focusing on reaching target replication factor across all CIDs so we can clear out local car file cache.

Currently active SPs include f02055660, f01942480, f01611097, others are active sporadically or being recruited.

parkan commented 1 year ago

@s0nik42 @fabriziogianni7 @dannyob could we approve this please? thank you :pray:

s0nik42 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacec5kekwf5t6wyzwffnoharpy2asc2a6mimsoektqd2f7gt7hym2as

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

400.00TiB

Signer Address

f1wxhnytjmklj2czezaqcfl7eb4nkgmaxysnegwii

Id

925d1b6a-c4e4-49cc-809a-277fb4f04795

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec5kekwf5t6wyzwffnoharpy2asc2a6mimsoektqd2f7gt7hym2as

s0nik42 commented 1 year ago

hey @parkan , approved 🏃🏻

cryptowhizzard commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea7rfpabthhxeuaqzuf3u2qvt2rid2t6k3pbye5pwe73hysrlmcv2

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

400.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

925d1b6a-c4e4-49cc-809a-277fb4f04795

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea7rfpabthhxeuaqzuf3u2qvt2rid2t6k3pbye5pwe73hysrlmcv2

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 5

Multisig Notary address

f02049625

Client address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

DataCap allocation requested

400TiB

Id

a65cf63d-d49a-42f6-af9e-66fdc452eced

xinaxu commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb5zzaff55ypld7b4yr2tdgsuez7s4nrecpid7jj6owvimxppb2p6

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

400.00TiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

a65cf63d-d49a-42f6-af9e-66fdc452eced

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb5zzaff55ypld7b4yr2tdgsuez7s4nrecpid7jj6owvimxppb2p6

bobdubois commented 1 year ago

@parkan , SP f02359301 ready to seal IA dataset via Spade but awaiting the datacap.

dannyob commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecsvzervsl66aelmagfvy4ahnomomyxxl6sihsujkhjhikjjwnreq

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

400.00TiB

Signer Address

f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a

Id

a65cf63d-d49a-42f6-af9e-66fdc452eced

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecsvzervsl66aelmagfvy4ahnomomyxxl6sihsujkhjhikjjwnreq

Destore2023 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 89.56% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

Destore2023 commented 1 year ago

Perfect retrieval report, happy to be the allocator

image

Destore2023 commented 1 year ago

Very good use case image

Destore2023 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacec2ksii25v3nvwi23f3ix6ezfwsy7yk3dnuaneqikt44hd72edbxc

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

400.00TiB

Signer Address

f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi

Id

a65cf63d-d49a-42f6-af9e-66fdc452eced

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec2ksii25v3nvwi23f3ix6ezfwsy7yk3dnuaneqikt44hd72edbxc

jamerduhgamer commented 1 year ago

Dataset is a good use case. SP distribution is healthy. Deal Data replication just needs more SPs onboarded. CID sharing is miniscule.

Willing to support this next datacap tranche.

jamerduhgamer commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebiiaz2zhjfquuzn2gtgi6ghpxdyd232exp5z4h2emc5mf77u5hok

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

400.00TiB

Signer Address

f1kqdiokoeubyse4qpihf7yrpl7czx4qgupx3eyzi

Id

a65cf63d-d49a-42f6-af9e-66fdc452eced

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebiiaz2zhjfquuzn2gtgi6ghpxdyd232exp5z4h2emc5mf77u5hok

Sunnyiscoming commented 1 year ago

Hello, @parkan per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

parkan commented 1 year ago

Hi @Sunnyiscoming, these requirements are new to me, let me put something together.

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

parkan commented 11 months ago

still working on this

dannyob commented 11 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedns6cniehurh5g76l65fnqhqoij4566rhnghoqmf5vawqpnchqps

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

400.00TiB

Signer Address

f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a

Id

a65cf63d-d49a-42f6-af9e-66fdc452eced

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedns6cniehurh5g76l65fnqhqoij4566rhnghoqmf5vawqpnchqps

cryptowhizzard commented 11 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedg7efntv7wpuzhvzsetylwubnepoqoumfgnx5pzl2lodgp6qa2by

Address

f1wp6zoxj7sydnrywvzp276x3gayghi7r6le4tcwy

Datacap Allocated

400.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

Id

a65cf63d-d49a-42f6-af9e-66fdc452eced

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedg7efntv7wpuzhvzsetylwubnepoqoumfgnx5pzl2lodgp6qa2by

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

ianconsolata commented 11 months ago

Please keep this application open.

Sunnyiscoming commented 10 months ago

Please provide ID, City, Country, Organization of each SP here.

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

-- Commented by Stale Bot.

parkan commented 10 months ago

@Sunnyiscoming provided below

name id city, state country
Seal Storage f01886690 Las Vegas, NV US
Telnyx f01872811 Dallas US
CES Group Inc f01611097 San Juan Capistrano US
PiKNiK f01851060 Las Vegas US
filcollins f01953925 Quebec CA
Replikate LLC f02028544 Salt Lake City, Utah US
W3i f02055660 Sydney AU
VOT Group f01942480 Denver CO US
Distributed Storage Solutions Limited f0223933 Sydney AU
DIGITAL INCOME FUND PTY LTD f01319368 Sydney AU
FILstorage.io f02359301 Albi FR
Superusey f02199533 Phoenix, Arizona US
W3i f02172481 Sydney AU