filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] [DEPRECATED] USC Shoah Foundation #27

Closed anitapac closed 2 years ago

anitapac commented 3 years ago

Large Dataset Notary Application

To apply for a DataCap allocation for your dataset, please fill out the following information.

Core Information

Please respond to the questions below in pargraph form, replacing the text saying "Please answer here". Include as much detail as you can in your answer!

Project details

Share a brief history of your project and organization.

USC Shoah Foundation – The Institute for Visual History and Education develops empathy, understanding and respect through testimony, using its Visual History Archive of more than 55,000 video testimonies, award-winning IWitness education program, and the Center for Advanced Genocide Research. USC Shoah Foundation's interactive programming, research and materials are accessed in museums and universities, cited by government leaders and NGOs, and taught in classrooms around the world. Now in its third decade, USC Shoah Foundation reaches millions of people on six continents from its home at the Dornsife College of Letters, Arts and Sciences at the University of Southern California.

What is the primary source of funding for this project?

Filecoin Foundation for the Decentralized Web (FFDW)

What other projects/ecosystem stakeholders is this project associated with?

Starling Labs

Use-case details

Describe the data being stored onto Filecoin

Digital Library of Survivor Testimonies - compilation of audiovisual content from holocaust and genocide survivors. Majority of the data is lossless copies of collected data, but some of the dataset is lower quality replicas of the content.

Where was the data in this dataset sourced from?

Audiovisual content was recorded by volunteers and is stored on tape drives at the University of Southern California. It consists of live interviews with survivors of holocaust and genocide. Maintaining the integrity of the original content is extremely important, and USC constantly runs fixity checks on the content. 

Can you share a sample of what is in the dataset? A link to a file, an image, a table, etc., are good examples of this.

A set of testimonies is available on YouTube and viewable by anyone: https://www.youtube.com/playlist?list=PLWIFgIFN2QqiDdkA-MXpsvZOSvTYkEGsL. 

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

A lot of the data is already publicly available for view via the Visual History Archive (https://vhaonline.usc.edu/). Some of the data has been requested to be private for a period of time based on requests from the interviewees of the content. 

What is the expected retrieval frequency for this data?

This is primarily for long-term archiving purposes only. Viewing / retrieving of this content should primarily be happening through the Visual History Archive (https://vhaonline.usc.edu/).

For how long do you plan to keep this dataset stored on Filecoin? Will this be a permanent archival or a one-time storage deal?

Yes, permanent archival. 

DataCap allocation plan

In which geographies do you plan on making storage deals?

We will be prioritizing making deals globally in any geography where our content can be legally stored by storage providers. 

What is your expected data onboarding rate? How many deals can you make in a day, in a week? How much DataCap do you plan on using per day, per week?

The current plan is to use offline data transfer mechanisms that will enable 100s of terabytes of content to be stored on a weekly basis. We hope to have access to at least 100TiB of DataCap per week.

How will you be distributing your data to miners? Is there an offline data transfer process?

Yes, there is going to be an offline data transfer process either through hosting files online where storage providers can download them or (where logistically feasible) through the distribution of content on physical drives. 

How do you plan on choosing the miners with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We would like to work with several reputable large-scale storage provider operations to ensure geo-distribution and reliability of storage.  

How will you be distributing data and DataCap across miners storing data?

This project aims to onboard 4+PiB of original data, for which we’d like to store multiple replicas (2-5) with separate storage providers for each replica. Deals will be structured to be as close to sector size as possible for a storage provider. 
large-datacap-requests[bot] commented 3 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

dannyob commented 3 years ago

Filecoin Foundation approves. Do you have a filecoin address that you wish the datacap to be attached to yet?

large-datacap-requests[bot] commented 3 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

flyworker commented 3 years ago

FilSwan approves. Have been working with starling team before ,good for us.

starling-admin commented 3 years ago

Filecoin Foundation approves. Do you have a filecoin address that you wish the datacap to be attached to yet?

Hi @dannyob / We will be using this address: f3w5fx6wta4ewl2iyf7xcogmzffz2fmrngpzdpduj3xmk3dwjxc6dyq36gdf3rflkkrblh5nci5xymc5hal3qq

Fenbushi-Filecoin commented 3 years ago

Count us in.

dkkapur commented 2 years ago

In the 3pm UTC notary governance call today, we had 3 additional notaries agree to support this application, @s0nik42, @XnMatrixSV, @swatchliu. This is now at 6 notary approvals.

Destore2023 commented 2 years ago

We’ll support it too, history needs to be recorded forever

MegTei commented 2 years ago

Great project, count me in. Happy to lead.

neogeweb3 commented 2 years ago

Count me in.

XnMatrixSV commented 2 years ago

Count me in.

starling-admin commented 2 years ago

Sincere thanks to all. We're so excited to achieve this milestone.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

galen-mcandrew commented 2 years ago

List of notaries:

  1. @dannyob f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a
  2. @flyworker f1hlubjsdkv4wmsdadihloxgwrz3j3ernf6i3cbpy
  3. @Fenbushi-Filecoin f1yqydpmqb5en262jpottko2kd65msajax7fi4rmq
  4. @s0nik42 f1wxhnytjmklj2czezaqcfl7eb4nkgmaxysnegwii
  5. @XnMatrixSV f1yuz2twsllparyfqwslfiuxrc5wj4mfiflvnsw6a
  6. @swatchliu f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi
  7. @MegTei f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i

These cover Europe, North America, GCR, & Oceania.

galen-mcandrew commented 2 years ago

Multisig Notary requested

Notary addresses

f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a f1hlubjsdkv4wmsdadihloxgwrz3j3ernf6i3cbpy f1yqydpmqb5en262jpottko2kd65msajax7fi4rmq f1wxhnytjmklj2czezaqcfl7eb4nkgmaxysnegwii f1yuz2twsllparyfqwslfiuxrc5wj4mfiflvnsw6a f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

100TiB

large-datacap-requests[bot] commented 2 years ago

**Multisig created and sent to RKH f01242135

galen-mcandrew commented 2 years ago

Sorry @neogeweb3 , looks like this hit 7 right as you were commenting, with @MegTei volunteering to lead!

At this time, we're seeing it in the app for the root key holders, and you can watch this issue from the notary governance repo to follow creation of the multisig: https://github.com/filecoin-project/notary-governance/issues/224

galen-mcandrew commented 2 years ago

Looks like this cleared the root key holders! Kicking off the first allocation

galen-mcandrew commented 2 years ago

DataCap Allocation requested

Multisig Notary address

f01242135

Client address

f3w5fx6wta4ewl2iyf7xcogmzffz2fmrngpzdpduj3xmk3dwjxc6dyq36gdf3rflkkrblh5nci5xymc5hal3qq

DataCap allocation requested

50TiB

galen-mcandrew commented 2 years ago

@dannyob @flyworker @Fenbushi-Filecoin @s0nik42 @XnMatrixSV @swatchliu @MegTei

Destore2023 commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecx4upwvknew7nppzn77mn7f47qu2hdg5djb64bwwwawjjubntzge

Address

f3w5fx6wta4ewl2iyf7xcogmzffz2fmrngpzdpduj3xmk3dwjxc6dyq36gdf3rflkkrblh5nci5xymc5hal3qq

Datacap Allocated

54975581388800

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecx4upwvknew7nppzn77mn7f47qu2hdg5djb64bwwwawjjubntzge

flyworker commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedple5r6vop3ftim6au4os4o7hegd7nzstldbdoa7udtodnglhufu

Address

f3w5fx6wta4ewl2iyf7xcogmzffz2fmrngpzdpduj3xmk3dwjxc6dyq36gdf3rflkkrblh5nci5xymc5hal3qq

Datacap Allocated

54975581388800

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedple5r6vop3ftim6au4os4o7hegd7nzstldbdoa7udtodnglhufu

MegTei commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceastsphxqsckdxh4supimoypy2t7vvswgs2z6gbgsnz4jezp2ua4g

Address

f3w5fx6wta4ewl2iyf7xcogmzffz2fmrngpzdpduj3xmk3dwjxc6dyq36gdf3rflkkrblh5nci5xymc5hal3qq

Datacap Allocated

54975581388800

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceastsphxqsckdxh4supimoypy2t7vvswgs2z6gbgsnz4jezp2ua4g

starling-admin commented 2 years ago

Hello to all. Just a quick mention that we are working on a new wallet solution and will be updating the wallet address once we get that set-up. Thanks!

XnMatrixSV commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaced5u23l24knyoafti3eljz2vfehe7sjdgkz7sj5coeuq4xfwatcy6

Address

f3w5fx6wta4ewl2iyf7xcogmzffz2fmrngpzdpduj3xmk3dwjxc6dyq36gdf3rflkkrblh5nci5xymc5hal3qq

Datacap Allocated

54975581388800

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced5u23l24knyoafti3eljz2vfehe7sjdgkz7sj5coeuq4xfwatcy6

s0nik42 commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceba7fnmocxegcuwf6ldinlesdcjjh4kskanejjr62tkanisdwpwki

Address

f3w5fx6wta4ewl2iyf7xcogmzffz2fmrngpzdpduj3xmk3dwjxc6dyq36gdf3rflkkrblh5nci5xymc5hal3qq

Datacap Allocated

54975581388800

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceba7fnmocxegcuwf6ldinlesdcjjh4kskanejjr62tkanisdwpwki

galen-mcandrew commented 2 years ago

Thanks notaries!

@starling-admin At this time, we are seeing 51 TiB at address f3w5fx6wta4ewl2iyf7xcogmzffz2fmrngpzdpduj3xmk3dwjxc6dyq36gdf3rflkkrblh5nci5xymc5hal3qq

Happy deal sealing!

galen-mcandrew commented 2 years ago

rolling to https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/53

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find any Expected weekly DataCap usage rate in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find any Expected weekly DataCap usage rate in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f066596 has sealed 79.51% of total datacap.

⚠️ 33.33% of total deal sealed by f022352 are duplicate data.

⚠️ f020378 has unknown IP location.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f066596 San Diego, California, US 73.06 TiB 79.51% 69.09 TiB 5.43%
f0678914 San Diego, California, US 8.02 TiB 8.72% 7.92 TiB 1.17%
f01558688new Toronto, Ontario, CA 3.00 TiB 3.26% 2.88 TiB 4.17%
f01207045 Heerhugowaard, North Holland, NL 2.56 TiB 2.79% 2.56 TiB 0.00%
f01201327 Heerhugowaard, North Holland, NL 2.31 TiB 2.52% 2.31 TiB 0.00%
f01208862 Heerhugowaard, North Holland, NL 2.28 TiB 2.48% 2.28 TiB 0.00%
f01199442 Heerhugowaard, North Holland, NL 96.00 GiB 0.10% 96.00 GiB 0.00%
f022352 Oslo, Oslo, NO 96.00 GiB 0.10% 64.00 GiB 33.33%
f010617 Surrey, British Columbia, CA 64.00 GiB 0.07% 64.00 GiB 0.00%
f019551 Birmingham, England, GB 64.00 GiB 0.07% 64.00 GiB 0.00%
f02576 Copenhagen, Capital Region, DK 64.00 GiB 0.07% 64.00 GiB 0.00%
f0707721 Heerhugowaard, North Holland, NL 64.00 GiB 0.07% 64.00 GiB 0.00%
f09848 Rancho Santa Margarita, California, US 64.00 GiB 0.07% 64.00 GiB 0.00%
f01199430 Heerhugowaard, North Holland, NL 32.00 GiB 0.03% 32.00 GiB 0.00%
f0840770 University Park, Texas, US 32.00 GiB 0.03% 32.00 GiB 0.00%
f020378 Unknown 32.00 GiB 0.03% 32.00 GiB 0.00%
f08399 Seattle, Washington, US 32.00 GiB 0.03% 32.00 GiB 0.00%
f01392893 Amsterdam, North Holland, NL 32.00 GiB 0.03% 32.00 GiB 0.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 99.46% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
66.52 TiB 70.30 TiB 1 76.50%
10.34 TiB 21.09 TiB 2 22.96%
32.00 GiB 192.00 GiB 6 0.20%
32.00 GiB 320.00 GiB 9 0.34%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Verifier
f17g7h52bsi53rb263xwne573dusskit4mieqkgry USC Shoah Foundation 40.08 TiB 1,282 LDN v3 multisig
f144zep4gitj73rrujd3jw6iprljicx6vl4wbeavi Textile 5.75 TiB 68 LDN # 61
f3vnq2cmwig3qjisnx5hobxvsd4drn4f54xfxnv4t
ciw6vnjdsf5xipgafreprh5riwmgtcirpcdmi3urb
g36a
WhyrusleepingEstuary - Applications Research Group 384.00 GiB 2 LDN # 44

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger