filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] Foldingathome COVID-19 Dataset #1024

Closed Megan008 closed 1 year ago

Megan008 commented 2 years ago

Large Dataset Notary Application

To apply for DataCap to onboard your dataset to Filecoin, please fill out the following.

Core Information

Please respond to the questions below by replacing the text saying "Please answer here". Include as much detail as you can in your answer.

Project details

Share a brief history of your project and organization.

I have participated in some projects and hackathon. I have experience on it.

What is the primary source of funding for this project?

Personal income.

What other projects/ecosystem stakeholders is this project associated with?

No.

Use-case details

Describe the data being stored onto Filecoin

[Folding@home](http://foldingathome.org/) is a massively distributed computing project that uses biomolecular simulations to investigate the [molecular origins of disease](https://foldingathome.org/diseases/) and accelerate the discovery of new therapies.

Where was the data in this dataset sourced from?

Simulations of SARS-CoV-2 and associated host proteins, with emphasis on discovering druggable cryptic pockets, documented at the [MolSSI COVID Hub](https://covid.molssi.org//simulations/#foldinghome-simulations-of-the-sars-cov-2-spike-protein-spike-spike-binding).

Can you share a sample of the data? A link to a file, an image, a table, etc., are good ways to do this. 

https://registry.opendata.aws/foldingathome-covid19/

         Confirm that this is a public dataset that can be retrieved by anyone on the Network (i.e., no specific permissions or access rights are required to view the data).

Yes, it's a public dataset.

What is the expected retrieval frequency for this data?

Multiple times.

For how long do you plan to keep this dataset stored on Filecoin?

2 years.

DataCap allocation plan

In which geographies (countries, regions) do you plan on making storage deals?

North america; Korea; China.

How will you be distributing your data to storage providers? Is there an offline data transfer process?

75% data will be distributed by offline data transfer. Other data will use online transfer for distributing with storage providers who close to me.

How do you plan on choosing the storage providers with whom you will be making deals? This should include a plan to ensure the data is retrievable in the future both by you and others.

I would let 1 sp who used to cooperate with me for this deal. Now I'm chatting with other sps. f023495, f0508988

How will you be distributing deals across storage providers?

I have communicated with 4 sp. In first time, I will divide 1/4 data to each sp. If I find out more sp, I will decrease the percentage of deals to them --- for decentralized storage. 

Do you have the resources/funding to start making deals as soon as you receive DataCap? What support from the community would help you onboard onto Filecoin?

Yes.
large-datacap-requests[bot] commented 2 years ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

raghavrmadya commented 2 years ago

Do you have permission from Folding@home?

raghavrmadya commented 2 years ago

Who are the SPs you plan to work with and what exactly is your data transfer plan? the outlined plan is really unclear

Megan008 commented 2 years ago

@raghavrmadya Thank you for your questions. COVID-19 is a public dataset and is not exclusive to a specific organization. So it is not necessary to have permission from folding@home in advance to download and store the dataset. It is similar to how programmers do not need to get permission from github to use their public code. The SPs we have worked and discussed before include f01854755, f01823070 and f01878693, etc. After our application has been approved, we plan to divide data to 8-10 SPs according to BDE platform.

raghavrmadya commented 2 years ago

Thanks @Megan008. We have cases before where clients have needed approval of the manager for public data sets. I also see that you have many applications open. Can you share more about yourself and any organization you are representing as onboarding many PiBs of data through multiple applications requires a team effort. I'm tagging @Kernelogic as they have dealt with such challenges with clients before as it relates to public datasets

kernelogic commented 2 years ago

Folding@home dataset is CreativeCommons licensed so license wise it should be fine.

It consists about 450TB of raw data from AWS S3: arn:aws:s3:::fah-public-data-covid19-antibodies | us-east-2 | 8.6 TiB arn:aws:s3:::fah-public-data-covid19-cryptic-pockets | us-east-2 | 71.0 TiB arn:aws:s3:::fah-public-data-covid19-absolute-free-energy | us-east-2 | 369.5 TiB arn:aws:s3:::fah-public-data-covid19-moonshot-dynamics | us-east-2 | 1.8 TiB

However, I would have the following questions:

  1. This dataset has been onboarded many times during Slingshot v2, it is also included in the Slingshot v3.
  2. For open dataset I think it is important to provide ways to index / retrieve, not just backup. Like in Slingshot v2 we were asked to provide websites and documents about how the data can be used. Do you have any plans on this regard?
  3. Download 450TB of raw data requires significant internet bandwidth, where are you located and do you have it?
Megan008 commented 2 years ago

@raghavrmadya I'm a community member. As I mentioned before, I'm going to contact more SPs to distribute data via BDE platform next. And I also have sp that I have worked with will continue to work together, so I think we can complete it.

@raghavrmadya Thank you for your questions. COVID-19 is a public dataset and is not exclusive to a specific organization. So it is not necessary to have permission from folding@home in advance to download and store the dataset. It is similar to how programmers do not need to get permission from github to use their public code. The SPs we have worked and discussed before include f01854755, f01823070 and f01878693, etc. After our application has been approved, we plan to divide data to 8-10 SPs according to BDE platform.

Megan008 commented 2 years ago

@kernelogic Thank you for your points and concern! I am currently in Singapore, but I look forward to contacting SPs around the world. I am not participating in the Slingshot, so I think I need to follow LDN's rules rather than slingshot's.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

100TiB

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

DataCap allocation requested

50TiB

Id

fd3d1516-a183-467e-9ad7-01964bb49b11

cryptowhizzard commented 1 year ago

1062

1362

1013 -> Abuse ( CID sharing )

TakiChain commented 1 year ago

The applicant contacted me via DM and went through our due diligence. Willing to support in the first round and will keep an eye on later allocations. Looking forward to seeing your next milestone.

TakiChain commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedq4hx2jsgyifcgazexuerxpsrkxlpvjl6f4e25667gzgaq7gexfc

Address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Datacap Allocated

50.00TiB

Signer Address

f15impf3j2zcaex4lhyxndxswuuhv24vzstuqtxsi

Id

fd3d1516-a183-467e-9ad7-01964bb49b11

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedq4hx2jsgyifcgazexuerxpsrkxlpvjl6f4e25667gzgaq7gexfc

AthSmith commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecubgphhwp2m6spj7idtuvaaopy2ohlpawpkrkxpruz62lgcqxeks

Address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Datacap Allocated

50.00TiB

Signer Address

f1vxbqrf7rfum3n6m5u6eb4re6xj7amvsaqnzu64y

Id

fd3d1516-a183-467e-9ad7-01964bb49b11

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecubgphhwp2m6spj7idtuvaaopy2ohlpawpkrkxpruz62lgcqxeks

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

DataCap allocation requested

100TiB

Id

bec108fc-6849-40c6-a516-a3362ca51c28

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

100TiB

Total DataCap granted for client so far

50TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.95PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
null null 50TiB null 352GiB
BobbyChoii commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedemxymy7etyr37dk57soef4zczfyxvoqmafdthz4lvyjnjaaug76

Address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Datacap Allocated

100.00TiB

Signer Address

f1irqs2gmctiv3jcdfwuch7oxvf4ixh3k4b2wc24i

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedemxymy7etyr37dk57soef4zczfyxvoqmafdthz4lvyjnjaaug76

Casey-PG commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebluals2lpiz7lbjkcuxivocn6drljn5umrfmlzghkvqaosn6b63s

Address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Datacap Allocated

100.00TiB

Signer Address

f15impf3j2zcaex4lhyxndxswuuhv24vzstuqtxsi

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebluals2lpiz7lbjkcuxivocn6drljn5umrfmlzghkvqaosn6b63s

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

DataCap allocation requested

200TiB

Id

cbbb6625-7682-4b65-880f-c0d6f5d5c06d

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Rule to calculate the allocation request amount

200% of weekly dc amount requested

DataCap allocation requested

200TiB

Total DataCap granted for client so far

9094.9YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-1.09B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
null null 100TiB null 33.81TiB
Bennyyangpu commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedxtl7kpxvlndxflemhl2cqlgu5w3gtxjatmdtnhv7fbsl3mqdqmi

Address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Datacap Allocated

200.00TiB

Signer Address

f174fg3bqbln3zjnkxtyf6s54txqkr7yqkj6cig7y

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedxtl7kpxvlndxflemhl2cqlgu5w3gtxjatmdtnhv7fbsl3mqdqmi

cryptowhizzard commented 1 year ago

Reported for datacap abuse and violation of code of conduct.

@raghavrmadya @dkkapur

MEIYAN666 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

MEIYAN666 commented 1 year ago

The client followed the allocation plan as planned, good report is all dimensions. Keep it up!

MEIYAN666 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecmz55642hdg5pgzop7blkrc52pe3qkrp5ch2xx3sdofcl4ybwove

Address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Datacap Allocated

200.00TiB

Signer Address

f1bwugfihrmn3iyunzyxst5nttql3dge4khwmurtq

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecmz55642hdg5pgzop7blkrc52pe3qkrp5ch2xx3sdofcl4ybwove

MEIYAN666 commented 1 year ago

Just saw the record that the applicant has some non-compliant usage records, but not this application. I don't know what the community's opinion is on whether applicants with non-compliant usage records are allowed to apply for LDN again, I hope to have an answer soon?

Willing to abide by the consensus opinion of the community.

cryptowhizzard commented 1 year ago

@Meibuy

Yes, the opinion is that if someone harms the community that there should be due diligence done in a way that it cannot happen again.

You signed on an application that is disputed in notion. I will add this to the dispute list.

Bennyyangpu commented 1 year ago

I noticed that the applicant had a violation record before signing. However, she repeatedly assured me in DMs that it was a cooperation issue of miners. There will be no such problem in the future. I think the applicant knows enough about Filecoin as a hackathon participant, so I decided to give her a chance to prove herself again. So far, the report of this application looks healthy.

We all know that Github is easy to sign up for, and applicants can use different accounts to apply for LDN applications. After the filplus checker went live, many applications had problems with duplicate data. I would like to know the community's views of these applications and the applicants.

Bennyyangpu commented 1 year ago

If the community prohibits applicants involved in violation from applying again, I recommend @Megan008 or a governance team member to close this issue.

BobbyChoii commented 1 year ago

I agree with the above points. There are no explicit rules for clients who have been in dispute before. I will abide by community feedback and official recommendations on how to handle such applications.

cryptowhizzard commented 1 year ago

[quote]I noticed that the applicant had a violation record before signing. However, she repeatedly assured me in DMs that it was a cooperation issue of miners. [/quote]

How? @Megan008 is a data preparer. She sends her data from her wallet to an SP as a verified deal. What happened is that she did not pack the dataset as she said. She packed only a few files and shared that thousands of times with the people involved in that scheme. Miners can just accept a deal for a price. They don't have influence on what is stored on their drives so don't come here with those fairytales.

Again ... You guys decided to sign a disputed application. The rules for this are clear, you should not. Reasons why are also clear.

From my point of view the dispute is valid. Datacap should be revoked. If we were to get an upfront answer i would reconsider but we A don't get any KYC and B don't get any honest answer on what happened. Thirdly it is beyond me that we get dozens of new Github handles per day ( from the same people ) and specifically this one needs attention to continue.

@dkkapur @raghavrmadya your call.

Megan008 commented 1 year ago

@cryptowhizzard Sorry for raising disputes. I understand that past mistakes may have reduced my trustworthiness. I'm deeply sorry about it and i wish i can get a chance to prove that i'm fixing it. Please tell me what should I do.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

DataCap allocation requested

400TiB

Id

dc5d06c4-9f85-4bc9-8090-0060f99e192c

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Rule to calculate the allocation request amount

400% of weekly dc amount requested

DataCap allocation requested

400TiB

Total DataCap granted for client so far

18189894035458574336.0YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-2.19B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
null null 200TiB null 47.15TiB
Casey-PG commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb3c3zbbm46n76nfl4rmju6esbwaznh6mlf3aptyp4qj4e42bic66

Address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Datacap Allocated

400.00TiB

Signer Address

f1d4yb3wags3mtddzesxoo63jv7dmlec3bq4yteni

Id

dc5d06c4-9f85-4bc9-8090-0060f99e192c

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb3c3zbbm46n76nfl4rmju6esbwaznh6mlf3aptyp4qj4e42bic66

TakiChain commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebwgiuxbzfcmuga4cpudzlu5hxwgghhkbnt2a54goaulifwscryms

Address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Datacap Allocated

400.00TiB

Signer Address

f15impf3j2zcaex4lhyxndxswuuhv24vzstuqtxsi

Id

dc5d06c4-9f85-4bc9-8090-0060f99e192c

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebwgiuxbzfcmuga4cpudzlu5hxwgghhkbnt2a54goaulifwscryms

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

DataCap allocation requested

400TiB

Id

b1b3a935-36cc-4800-90cb-7c5b98cd6d12

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Rule to calculate the allocation request amount

400% of weekly dc amount requested

DataCap allocation requested

400TiB

Total DataCap granted for client so far

18189894035458574336.0YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-2.19B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
null null 200TiB null 3.78TiB
Suyanj commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebktvx7k47limisrv73ioz6c7xgkbgjyc4mxtu2wq2bth4p4a5voa

Address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Datacap Allocated

400.00TiB

Signer Address

f1ihv7gz3vn3xqvikpt4rwryecgisl7745lodx3yi

Id

b1b3a935-36cc-4800-90cb-7c5b98cd6d12

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebktvx7k47limisrv73ioz6c7xgkbgjyc4mxtu2wq2bth4p4a5voa

Suyanj commented 1 year ago

Public data with clear allocation plan, we are willing to help onboard more valuable dataset.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 6

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

DataCap allocation requested

400TiB

Id

ce15296d-34d7-4f78-a693-db943fcaeec6

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1au3nipqjprr5xp2mwsarr7obvpx2dwy4is6qn4y

Rule to calculate the allocation request amount

400% of weekly dc amount requested

DataCap allocation requested

400TiB

Total DataCap granted for client so far

3.6379788070917164e+49YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

3.6379788070917164e+49YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
25124 13 400TiB 13.61 100.09TiB
github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

MegaFil commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

Wengeding commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

All these SP's are involved in CID sharing and do not support retrieval.

Akin, nothing works.

Scherm­afbeelding 2023-07-31 om 18 06 43
github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.