filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

Twinquasar / Folding at home Covid 19 #368

Closed s0nik42 closed 1 year ago

s0nik42 commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

At twinquasar we are onboarding data to Slingshot since Slingshot 2.2 . Our goal is to onboard useful data to the network in a reusable way.  ```

What is the primary source of funding for this project?

Our own funds```

What other projects/ecosystem stakeholders is this project associated with?

PL via the Slinshot challenge

Use-case details

Describe the data being stored onto Filecoin

Folding at home  dataset

Where was the data in this dataset sourced from?

Folding@home is a massively distributed computing project that uses biomolecular simulations to investigate the molecular origins of disease and accelerate the discovery of new therapies.

The dataset is coming from AWS public S3 repo  : https://registry.opendata.aws/foldingathome-covid19/

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this.

A link to 

https://molssi-bioexcel-covid-19-structure-therapeutics-hub.s3.amazonaws.com/GumbartGroup/RBD-ACE2.tgz

Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes

What is the expected retrieval frequency for this data?

Not much

For how long do you plan to keep this dataset stored on Filecoin?

18 months

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

US
APAC
EUROPE

How will you be distributing your data to storage providers? Is there an offline data transfer process?

Data will transfered offline

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

We're selecting the SPs with high reputation on slack  and who already participate to Slingshot. They have the ability to store deals at a very fast pace and a high success rate.  

How will you be distributing deals across storage providers?

1 to 2 copies per companies

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes
large-datacap-requests[bot] commented 2 years ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 2 years ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

galen-mcandrew commented 2 years ago

Multisig Notary requested

Total DataCap requested

500TiB

Expected weekly DataCap usage rate

290TiB

large-datacap-requests[bot] commented 2 years ago

**Multisig created and sent to RKH f01858390

large-datacap-requests[bot] commented 2 years ago

DataCap Allocation requested

Multisig Notary address

f01858390

Client address

f1x2wjpopqkrg6qtrhm2ieifcqwc5er7dy3usqkgi

DataCap allocation requested

25TiB

cryptowhizzard commented 2 years ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedqi45ugbqdjmlri5ygn4tyr5ybi5jstwfnm6r5g7b4tu3squ744k

Address

f1x2wjpopqkrg6qtrhm2ieifcqwc5er7dy3usqkgi

Datacap Allocated

25.00TiB

Signer Address

f1krmypm4uoxxf3g7okrwtrahlmpcph3y7rbqqgfa

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedqi45ugbqdjmlri5ygn4tyr5ybi5jstwfnm6r5g7b4tu3squ744k

dkkapur commented 2 years ago

Proposal went through AFAIK, @s0nik42 you just need an approval. Thanks!

dannyob commented 2 years ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceboyxv55jjuqxsoxipkl3xhhhis2faf3qeht5xn7rzutly2nrcgba

Address

f1x2wjpopqkrg6qtrhm2ieifcqwc5er7dy3usqkgi

Datacap Allocated

25.00TiB

Signer Address

f1k6wwevxvp466ybil7y2scqlhtnrz5atjkkyvm4a

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceboyxv55jjuqxsoxipkl3xhhhis2faf3qeht5xn7rzutly2nrcgba

filplus-checker commented 1 year ago

DataCap and CID Checker Report[^1]

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ f01385207 has sealed 99.22% of total datacap.

⚠️ 50.00% of total deal sealed by f010479 are duplicate data.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01385207 Lincoln, Nebraska, US 8.00 TiB 99.22% 8.00 TiB 0.00%
f010479 Paris, Île-de-France, FR 64.00 GiB 0.78% 32.00 GiB 50.00%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
7.97 TiB 7.97 TiB 1 98.84%
32.00 GiB 96.00 GiB 2 1.16%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

data-programs commented 1 year ago
KYC

This user’s identity has been verified through filplus.storage

Sunnyiscoming commented 1 year ago

Hello, @s0nik42 per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.