filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] USC Shoah Foundation LDN #2 #420

Closed ghost closed 2 years ago

ghost commented 2 years ago

Large Dataset Notary Application

Core Information

Please respond to the questions below in paragraph form, replacing the text saying "Please answer here". Include as much detail as you can in your answer!

Project details

Share a brief history of your project and organization.

USC Shoah Foundation – The Institute for Visual History and Education develops empathy, understanding and respect through testimony, using its Visual History Archive of more than 55,000 video testimonies, award-winning IWitness education program, and the Center for Advanced Genocide Research. USC Shoah Foundation's interactive programming, research and materials are accessed in museums and universities, cited by government leaders and NGOs, and taught in classrooms around the world. Now in its third decade, USC Shoah Foundation reaches millions of people on six continents from its home at the Dornsife College of Letters, Arts and Sciences at the University of Southern California.

What is the primary source of funding for this project?

Filecoin Foundation for the Decentralized Web (FFDW)

What other projects/ecosystem stakeholders is this project associated with?

Starling Labs

Use-case details

Describe the data being stored onto Filecoin

Digital Library of Survivor Testimonies - compilation of audiovisual content from holocaust and genocide survivors. Majority of the data is lossless copies of collected data, but some of the dataset is lower quality replicas of the content.

Where was the data in this dataset sourced from?

Audiovisual content was recorded by volunteers and is stored on tape drives at the University of Southern California. It consists of live interviews with survivors of holocaust and genocide. Maintaining the integrity of the original content is extremely important, and USC constantly runs fixity checks on the content.

Can you share a sample of what is in the dataset? A link to a file, an image, a table, etc., are good examples of this.

A set of testimonies is available on YouTube and viewable by anyone: https://www.youtube.com/playlist?list=PLWIFgIFN2QqiDdkA-MXpsvZOSvTYkEGsL.

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

A lot of the data is already publicly available for view via the Visual History Archive (https://vhaonline.usc.edu/). Some of the data has been requested to be private for a period of time based on requests from the interviewees of the content.

What is the expected retrieval frequency for this data?

This is primarily for long-term archiving purposes only. Viewing / retrieving of this content should primarily be happening through the Visual History Archive (https://vhaonline.usc.edu/).

For how long do you plan to keep this dataset stored on Filecoin? Will this be a permanent archival or a one-time storage deal?

Yes, permanent archival.

DataCap allocation plan

In which geographies do you plan on making storage deals?

We will be prioritizing making deals globally in any geography where our content can be legally stored by storage providers.

What is your expected data onboarding rate? How many deals can you make in a day, in a week? How much DataCap do you plan on using per day, per week?

The current plan is to use offline data transfer mechanisms that will enable 100s of terabytes of content to be stored on a weekly basis. We hope to have access to at least 100TiB of DataCap per week and doubling with each tranche as per usual allocations.

How will you be distributing your data to miners? Is there an offline data transfer process?

Yes, there is going to be an offline data transfer process either through hosting files online where storage providers can download them or (where logistically feasible) through the distribution of content on physical drives.

How do you plan on choosing the miners with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We would like to work with several reputable large-scale storage provider operations to ensure geo-distribution and reliability of storage.

How will you be distributing data and DataCap across miners storing data?

This project aims to onboard 4+ PiB of original data, for which we’d like to store multiple replicas (2-5) with separate storage providers for each replica. Deals will be structured to be as close to sector size as possible for a storage provider. The first datacap request for 5 PiB - https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/53 is not enough to cover multiple replicas of the dataset which is why this second datacap request is being put in.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find your Name in the information provided We could not find your Filecoin address in the information provided We could not find the Datacap requested in the information provided We could not find any Web site or social media info in the information provided We could not find any Expected weekly DataCap usage rate in the information provided We could not find any Region in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find your Name in the information provided We could not find your Filecoin address in the information provided We could not find the Datacap requested in the information provided We could not find any Web site or social media info in the information provided We could not find any Expected weekly DataCap usage rate in the information provided We could not find any Region in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! :exclamation: We have found some problems in the information provided. We could not find your Name in the information provided We could not find your Filecoin address in the information provided We could not find the Datacap requested in the information provided We could not find any Web site or social media info in the information provided We could not find any Expected weekly DataCap usage rate in the information provided We could not find any Region in the information provided

Please, take a look at the request and edit the body of the issue providing all the required information.
Yvette516 commented 2 years ago

How do you prove that you have more than 55,000 videos, do you have relevant copyright certificate?

jamerduhgamer commented 2 years ago

If you go to https://vhaonline.usc.edu/login and register an account, the website tells you that the "USC Shoah Foundation's Visual History Archive allows users to search through and view more than 54,000 video testimonies of survivors and witnesses of genocide."

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! :exclamation: We have found some problems in the information provided.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Yvette516 commented 2 years ago

@jamerduhgamer @Ray-PiKNiK You had already problems on the first application #53 , and why did you apply the second time?

jamerduhgamer commented 2 years ago

Hi @Yvette516, what problems do you mean? The sealing for the dataset is proceeding smoothly.

The main reason why we applied the second time is mentioned in the last section of the datacap request.

How will you be distributing data and DataCap across miners storing data?

"This project aims to onboard 4+ PiB of original data, for which we’d like to store multiple replicas (2-5) with separate storage providers for each replica. Deals will be structured to be as close to sector size as possible for a storage provider. The first datacap request for 5 PiB - https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/53 is not enough to cover multiple replicas of the dataset which is why this second datacap request is being put in."

Sunnyiscoming commented 2 years ago

@Ray-PiKNiK

  1. According to the https://filplus.d.interplanetary.one/clients/f01549256/breakdown More than 60% datacap stored in one node, f01833311. Can you explain about that?

How do you plan on choosing the miners with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We would like to work with several reputable large-scale storage provider operations to ensure geo-distribution and reliability of storage.

  1. Can you specify storage providers you will cooperate with?
ghost commented 2 years ago
  1. f01833311 is PiKNiK's SP. As the data owners and facilitators we are able to seal the datacap more quickly than the other SPs but in the long run, each of the many SPs we are partnered with will have an even share of the datacap.
  2. We are currently working with: Equity Labs, DLTX, DSS, Filswan, Holon, Linix, Seal Storage, Techgreedy, Telnyx, Lucky Storage, DCENT and Blocz IO.
ghost commented 2 years ago

Hi @galen-mcandrew, what are the next steps in this data cap request? Thanks!

ghost commented 2 years ago

@jamerduhgamer had a discussion with the governance team for next steps and will be closing out this issue.