filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] PUBLIC DATA-COVID-19 Genome Sequence Dataset [2/2] #1881

Closed nora310 closed 10 months ago

nora310 commented 1 year ago

Data Owner Name

National Library of Medicine (NLM)

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://www.ncbi.nlm.nih.gov/sra/docs/sra-aws-download/

Social Media

twitter-https://twitter.com/NIH_OSP
youtube-https://www.youtube.com/@nihofficeofsciencepolicy3005/videos

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

Data Type of Application

None

Custom multisig

Identifier

No response

Share a brief history of your project and organization

A centralized sequence repository for all records containing sequence associated with the novel corona virus (SARS-CoV-2) submitted to the National Center for Biotechnology Information (NCBI) Sequence Read Archive (SRA). Included are both the original sequences submitted by the principal investigator as well as SRA-processed sequences that require the SRA Toolkit for analysis. Additionally, submitter provided metadata included in associated BioSample and BioProject records is available alongside NCBI calculated data, such k-mer based taxonomy analysis results, contiguous assemblies (contigs) and associated statistics such as contig length, blast results for the assembled contigs, contig annotation, blast databases of contigs and their annotated peptides, and VCF files generated for each record relative to the SARS-CoV-2 RefSeq record. Finally, metadata is additionally made available in parquet format to facilitate search and filtering using the AWS Athena Service.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

Genomic sequence reads of SARS-CoV-2 and related coronaviridae, organized by NCBI accession. Files in the sra-src folder are in FASTQ, BAM, or CRAM format (original submission); files in the run folder are in .sra format and require the SRA Toolkit; Metadata for sra-pub-sars-cov2 in an Athena-queryable format.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/ncbi-covid-19/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe

How will you be distributing your data to storage providers

IPFS, Shipping hard drives

How do you plan to choose storage providers

Slack, Big data exchange

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

No response

How do you plan to make deals to your storage providers

Lotus client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

1PiB

Client address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

DataCap allocation requested

256TiB

Id

886161ee-9aef-41e1-9892-2cdf30cc301b

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

DataCap allocation requested

256TiB

Id

3650dc52-7a7e-435b-b991-cff1c6a16fb3

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

@OlivierMolenkamp

Can you let us (the community) know what due diligence has been done here for this client? What is the data onboarding plan for this client? Where is he/she going to store the data and who is she? What is their internet capacity? On what bases have you approved this application and why?

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to re-open if this is relevant, or start a new application for DataCap anytime. Thank you!

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

nora310 commented 1 year ago

Yes!

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

nora310 commented 1 year ago

Yes please!

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

nora310 commented 1 year ago

Yes please!

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

nora310 commented 1 year ago

Yes please!

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

clriesco commented 1 year ago

Removed stale label and reopened issue :)

cryptowhizzard commented 1 year ago

Dear nora310,

As notary I am doing due diligence on your LDN. I could not get retrieval to work. Can you please upload the car file of CID baga6ea4seaqkbdulkm3b33kdqpukyhrkp62iofn6575yihph7vnpobkfewohyoi ?

You can use our upload system at http://send.datasetcreators.com. Please select 7 days for the system to keep the file and post the link you received here so I (and other notaries) can download your content.

cryptowhizzard commented 1 year ago

Dear nora310,

As notary I am doing due diligence on your LDN. I could not get retrieval to work. Can you please upload the car file of CID baga6ea4seaqkbdulkm3b33kdqpukyhrkp62iofn6575yihph7vnpobkfewohyoi ?

You can use our upload system at http://send.datasetcreators.com. Please select 7 days for the system to keep the file and post the link you received here so I (and other notaries) can download your content.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

DataCap allocation requested

512TiB

Id

0bbcfae1-bef0-4eb9-8947-832c63c9e950

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

Rule to calculate the allocation request amount

100% weekly > 0.5PiB, requesting 0.5PiB

DataCap allocation requested

512TiB

Total DataCap granted for client so far

256TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.75PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
2063 3 256TiB 44.98 127.06TiB
github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

nora310 commented 1 year ago

Can not open this link.

Dear nora310,

As notary I am doing due diligence on your LDN. I could not get retrieval to work. Can you please upload the car file of CID baga6ea4seaqkbdulkm3b33kdqpukyhrkp62iofn6575yihph7vnpobkfewohyoi ?

You can use our upload system at http://send.datasetcreators.com. Please select 7 days for the system to keep the file and post the link you received here so I (and other notaries) can download your content.

spaceT9 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

nora310 commented 11 months ago

Yes

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

nora310 commented 11 months ago

Keep open.

github-actions[bot] commented 11 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

nora310 commented 11 months ago

Keep open.

Sunnyiscoming commented 10 months ago

Hello, @nora310 per the https://github.com/filecoin-project/notary-governance/issues/922 for Open, Public Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity and then the application will be allowed to move forward for additional notary review.

nora310 commented 10 months ago

Hello @Sunnyiscoming , I've finished.

Wengeding commented 10 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceaxsynsnej6xfyyvbmk62ipenb2mzoekmpqctjxelmwnol3zrbv3e

Address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

Datacap Allocated

512.00TiB

Signer Address

f1txfsjmix4vlzido4dkildrnbw26owtlbslexmpa

Id

0bbcfae1-bef0-4eb9-8947-832c63c9e950

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceaxsynsnej6xfyyvbmk62ipenb2mzoekmpqctjxelmwnol3zrbv3e

Holiday507 commented 10 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacecgkhpqrduuht4bu5dw3vqyefb7tv2dl7gexhcgoquaswdqclr6ie

Address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

Datacap Allocated

512.00TiB

Signer Address

f1sa3dp3a7fwirrsxjdthvzneo7rnjcrrfllsnjpq

Id

0bbcfae1-bef0-4eb9-8947-832c63c9e950

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecgkhpqrduuht4bu5dw3vqyefb7tv2dl7gexhcgoquaswdqclr6ie

large-datacap-requests[bot] commented 10 months ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

DataCap allocation requested

512TiB

Id

ba54d45c-c604-4f29-acc0-682d2df76c1f

TakiChain commented 10 months ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaced66y3bwxu2s2jk3fnrgfquheppotiomrtqqazgryenprv5abio26

Address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

Datacap Allocated

512.00TiB

Signer Address

f15impf3j2zcaex4lhyxndxswuuhv24vzstuqtxsi

Id

ba54d45c-c604-4f29-acc0-682d2df76c1f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced66y3bwxu2s2jk3fnrgfquheppotiomrtqqazgryenprv5abio26

Suyanj commented 10 months ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaced6qmjnybcqpivyhtef6prqcgog2ikflgub6l3gcxi6yzpmbimnc6

Address

f1ynotqzocbg2mzsnhtie6ozkyzxtkkdj2n3ooily

Datacap Allocated

512.00TiB

Signer Address

f1ihv7gz3vn3xqvikpt4rwryecgisl7745lodx3yi

Id

ba54d45c-c604-4f29-acc0-682d2df76c1f

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced6qmjnybcqpivyhtef6prqcgog2ikflgub6l3gcxi6yzpmbimnc6

herrehesse commented 10 months ago

checker:manualTrigger

filplus-checker-app[bot] commented 10 months ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 51.72% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

herrehesse commented 10 months ago

Screenshot 2023-11-13 at 13 38 35

Ofcourse! FULL VPN fake location time again.

@kevzak Can you close this LDN?

nora310 commented 10 months ago

@herrehesse Show your evidence.

Screenshot 2023-11-13 at 13 38 35

Ofcourse! FULL VPN fake location time again.

@kevzak Can you close this LDN?

herrehesse commented 10 months ago

Screenshot 2023-11-14 at 09 18 26

Because you asked so kindly. This is the most abusive application I have seen all of this month.

@kevzak can we close this immediately before abusive notaries sign again?

github-actions[bot] commented 10 months ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

nora310 commented 10 months ago

Screenshot 2023-11-14 at 09 18 26

Because you asked so kindly. This is the most abusive application I have seen all of this month.

@kevzak can we close this immediately before abusive notaries sign again?

Ridiculous