filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] Speedium - NIH NCBI Sequence Read Archive [6 / 27] #1873

Closed cryptowhizzard closed 1 year ago

cryptowhizzard commented 1 year ago

Data Owner Name

NIH - National Institute of Health

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://www.nih.gov/

Social Media

https://www.facebook.com/nih.gov/

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Data Type of Application

None

Custom multisig

Identifier

No response

Share a brief history of your project and organization

Since its launch, the Filecoin network has become an important player in the decentralised storage space, offering a secure and transparent alternative to traditional data storage solutions.

We as Speedium / DCENT have been engaged with storing real and valuable datasets on the Filecoin network since Slingshot 2.6 and have been actively developing tools to improve the process. We are always on the lookout for new and useful client data to onboard.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

NIH NCBI Sequence Read Archive (SRA) on AWS
The Sequence Read Archive (SRA), produced by the [National Center for Biotechnology Information (NCBI)](https://www.ncbi.nlm.nih.gov/) at the [National Library of Medicine (NLM)](http://nlm.nih.gov/) at the [National Institutes of Health (NIH)](http://www.nih.gov/), stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/ncbi-sra/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

How will you be distributing your data to storage providers

Cloud storage (i.e. S3), HTTP or FTP server, IPFS, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Big data exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

| MinerID | City | Continent | Business/Entity |
| --- | --- | --- | --- |
| `f01944347` | Oregon | USA | Jenny, Dabai |
| `f01952350` | Oregon | USA | Jenny, Dabai |
| `f01972364` | Oregon | USA | Jenny, Dabai |
| `f01972376` | Oregon | USA | Jenny, Dabai |
| `f02000937` | Chengdu | CN | MTY |
| `f01915033` | Chengdu | CN | MTY |
| `f0120****` | Melbourne | AU | HOLON |
| `f0115****` | Melbourne | AU | HOLON |
| `f01199430` | Heerhugowaard | EU | DCENT |
| `f01786387` | Heerhugowaard | EU | DCENT |
| `f01201327` | Heerhugowaard | EU | DCENT |
| `f01937642` | Heerhugowaard | EU | DCENT |
| `f0198****` | Dallas | USA | GREATERHEAT |
| `f0188****` | Singapore | AS | GREATERHEAT |
| `f01091851` | Omaha | USA | DLTx |
| `f01736668` | Omaha | USA | DLTx |
| `f01820744` | Omaha | USA | DLTx |
| `f0855584` | Omaha | USA | DLTx |
| `f01794610` | Omaha | USA | DLTx |
| `f01838599` | Kansas City | USA | DLTx |
| `f01845552` | Kansas City | USA | DLTx |

How do you plan to make deals to your storage providers

Boost client, Lotus client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

250TiB

Id

5f565437-2b4c-48ef-a9c8-2b3c4a0ef4a2

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

250TiB

Id

dac4fe66-d84e-454c-9d45-232ae75730fc

large-datacap-requests[bot] commented 1 year ago

The issue reached the total datacap requested. This should be closed

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Rule to calculate the allocation request amount

total dc reached

DataCap allocation requested

0

Total DataCap granted for client so far

InfinityYiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-InfinityB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
135939 42 1.95PiB 10.63 75.46TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

TimGuo7 commented 1 year ago

@cryptowhizzard can you share about the reason of duplicate data and CID sharing from the bot? the rest looks good and going to sign it.

xinaxu commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceacrbybgczcg4eou2awkjns5ndiqlzvkzgdzg3sahaxsptd42s6em

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

250.00TiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

dac4fe66-d84e-454c-9d45-232ae75730fc

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceacrbybgczcg4eou2awkjns5ndiqlzvkzgdzg3sahaxsptd42s6em

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

simonkim0515 commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

250TiB

Id

dac4fe66-d84e-454c-9d45-232ae75730fd

BobbyChoii commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebhybdgzmlqsjda2evuacpotoj2it7euxwyhirapgmgoeeay3o4fk

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

250.00TiB

Signer Address

f1irqs2gmctiv3jcdfwuch7oxvf4ixh3k4b2wc24i

Id

dac4fe66-d84e-454c-9d45-232ae75730fd

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebhybdgzmlqsjda2evuacpotoj2it7euxwyhirapgmgoeeay3o4fk

large-datacap-requests[bot] commented 1 year ago

The issue reached the total datacap requested. This should be closed

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Rule to calculate the allocation request amount

total dc reached

DataCap allocation requested

0

Total DataCap granted for client so far

InfinityYiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-InfinityB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
184369 44 1.95PiB 10.93 515.77TiB
simonkim0515 commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

500TiB

Id

dac4fe66-d84e-454c-9d45-232ae75730ff

psh0691 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaced2mqu5vnkww26czy67fbpbmf4rp2zdw7oeoz2x4uh3v66apsobwc

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

500.00TiB

Signer Address

f1qdko4jg25vo35qmyvcrw4ak4fmuu3f5rif2kc7i

Id

dac4fe66-d84e-454c-9d45-232ae75730ff

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaced2mqu5vnkww26czy67fbpbmf4rp2zdw7oeoz2x4uh3v66apsobwc

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceb7tpkvjokuuk4txbp2lqdpruhf6liv2bwzib7ldniy4fvgx4zohs

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

500.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb7tpkvjokuuk4txbp2lqdpruhf6liv2bwzib7ldniy4fvgx4zohs

kernelogic commented 1 year ago

Looking at the CID report for whole series and it looks good.

simonkim0515 commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

1000TiB

Id

dac4fe66-d84e-454c-9d45-232ae75730fb

mjroddy commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacea4gark7ivhidchabzbl7iakb4eez3nxzmdr6w4kpipxmskfdlhio

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1000.00TiB

Signer Address

f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea4gark7ivhidchabzbl7iakb4eez3nxzmdr6w4kpipxmskfdlhio

mjroddy commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceccv5dg5eb27dsldskyknrt27knshszzi6ilgkqb5iarqao7tgbmw

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1000.00TiB

Signer Address

f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i

Id

dac4fe66-d84e-454c-9d45-232ae75730fb

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceccv5dg5eb27dsldskyknrt27knshszzi6ilgkqb5iarqao7tgbmw

kernelogic commented 1 year ago

CID checker result looks very diversed, willing to support.

xinaxu commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedvcywbayvktqb6c6jespsj24uvs2ayxxnfihc3wvv2tyz6mnk74i

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1000.00TiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedvcywbayvktqb6c6jespsj24uvs2ayxxnfihc3wvv2tyz6mnk74i

simonkim0515 commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

2000TiB

Id

dac4fe66-d84e-454c-9d45-232ae75730fa

xinaxu commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebed6iosjgvluzerefj5icezksfh3py3x5ncec2oyy67inbfg4sua

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.95PiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebed6iosjgvluzerefj5icezksfh3py3x5ncec2oyy67inbfg4sua

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea65efqo3thy7fjsuw5ndkfwz5a4rozjbh3cs2olx6fsa7vs7rije

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.95PiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea65efqo3thy7fjsuw5ndkfwz5a4rozjbh3cs2olx6fsa7vs7rije

simonkim0515 commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

1250TiB

Id

dac4fe66-d84e-454c-9d45-232ae75730fm

Normalnoise commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

Normalnoise commented 1 year ago

I have a dd in slack and get the explain about "too much duplicate data", I think this dataset should be support

Here is the miner response (HOLON): https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/488#issuecomment-1410486103

Normalnoise commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecuib4qlg2ve53vx7vuxgrft5ko4qkjubdz6puvao22dfsivi622a

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.22PiB

Signer Address

f1c5non5yf35avgcpsqvxu4yj54yyvxorwyjochqq

Id

dac4fe66-d84e-454c-9d45-232ae75730fm

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecuib4qlg2ve53vx7vuxgrft5ko4qkjubdz6puvao22dfsivi622a

NDLABS-Leo commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

NDLABS-Leo commented 1 year ago

checker report seems healthy. Willing to sign.

NDLABS-Leo commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceab75xdrme53uigkls5vqvkpcdxjpps4r53iihzfndbkn46hf56qo

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.22PiB

Signer Address

f1yayfsv6whu3rheviucvventj3y6t542xfpb47ei

Id

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceab75xdrme53uigkls5vqvkpcdxjpps4r53iihzfndbkn46hf56qo

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

sgclouder commented 1 year ago

Hi,I'm retrieving your data and found something wrong ./lotus client query-ask f01165159 (no response) ./lotus client query-ask f02101378 (no response) ./lotus client query-ask f01208632 (no response) ./lotus client query-ask f02169441 (no response) ./lotus client retrieve --miner f02169441 --from f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy bafybeie2hfd zph23zo2sftrzijqgfo3kweebqbvgzcruyuolag3vdnt4gu /tmp/test Recv 0 B, Paid 0 FIL, Open (New), 0s [1684114229623219642|0] Recv 0 B, Paid 0 FIL, DealProposed (WaitForAcceptance), 4ms [1684114229623219642|0](wait here for 1 hour)

./lotus client retrieve --miner f02132151 --from f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy bafybeieg2k5bmuwmss52hz64ioveosidjw5swdkf7mlesj5j6jqw2jcuxi /tmp/cuix Recv 0 B, Paid 0 FIL, Open (New), 0s [1684114229623219640|0] Recv 0 B, Paid 0 FIL, DealProposed (WaitForAcceptance), 3ms [1684114229623219640|0](wait here for 1 hour)

could you tell me which miner can retrieve?

herrehesse commented 1 year ago

Holon: f01165159, f02101378, f01208632 Antwerp: f02169441

I will ask the SP's in question about their status right away. Thank you for checking @sgclouder !

jhookersyd commented 1 year ago

Let me take a look now @herrehesse @sgclouder Thanks for pointing it out. Jonathan

jhookersyd commented 1 year ago

This is fixed. This occurred because our senior security engineer has fat fingers and hates the world. Have a great day! Jonathan

sgclouder commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.79%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

data-programs commented 1 year ago
KYC

This user’s identity has been verified through filplus.storage