filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Speedium - NIH NCBI Sequence Read Archive [2 / 27] #1554

Closed cryptowhizzard closed 1 year ago

cryptowhizzard commented 1 year ago

Data Owner Name

NIH - National Institute of Health

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://www.nih.gov/

Social Media

https://www.facebook.com/nih.gov/

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Custom multisig

Identifier

No response

Share a brief history of your project and organization

Since its launch, the Filecoin network has become an important player in the decentralised storage space, offering a secure and transparent alternative to traditional data storage solutions.

We as Speedium / DCENT have been engaged with storing real and valuable datasets on the Filecoin network since Slingshot 2.6 and have been actively developing tools to improve the process. We are always on the lookout for new and useful client data to onboard.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

NIH NCBI Sequence Read Archive (SRA) on AWS
The Sequence Read Archive (SRA), produced by the [National Center for Biotechnology Information (NCBI)](https://www.ncbi.nlm.nih.gov/) at the [National Library of Medicine (NLM)](http://nlm.nih.gov/) at the [National Institutes of Health (NIH)](http://www.nih.gov/), stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, singularity, graphsplit

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/ncbi-sra/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS, Shipping hard drives, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Big data exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

MinerID City Continent Business/Entity
f01944347 Oregon USA Jenny, Dabai
f01952350 Oregon USA Jenny, Dabai
f01972364 Oregon USA Jenny, Dabai
f01972376 Oregon USA Jenny, Dabai
f02000937 Chengdu CN MTY
f01915033 Chengdu CN MTY
f0120**** Melbourne AU HOLON
f0115**** Melbourne AU HOLON
f01199430 Heerhugowaard EU DCENT
f01786387 Heerhugowaard EU DCENT
f01201327 Heerhugowaard EU DCENT
f01937642 Heerhugowaard EU DCENT
f0198**** Dallas USA GREATERHEAT
f0188**** Singapore AS GREATERHEAT
f01091851 Omaha USA DLTx
f01736668 Omaha USA DLTx
f01820744 Omaha USA DLTx
f0855584 Omaha USA DLTx
f01794610 Omaha USA DLTx
f01838599 Kansas City USA DLTx
f01845552 Kansas City USA DLTx

How do you plan to make deals to your storage providers

Boost client, Lotus client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

herrehesse commented 1 year ago

Our projected finish time for the entire NiH (15PiB) dataset is in the fourth quarter of 2023. This is because the speed of the slowest downloading service provider will determine the completion date. We expect to have completed the initial full copies by the end of the second quarter of 2023.

flyworker commented 1 year ago

Will close watch for the next round to see if it is improved

flyworker commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacea6fqymtoflavuun5tox4wuybjmxbb2b4ngbosntqj7y3c3vtdois

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.34PiB

Signer Address

f1hlubjsdkv4wmsdadihloxgwrz3j3ernf6i3cbpy

Id

d858ee79-a0dc-491a-8fc5-d644496bc544

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea6fqymtoflavuun5tox4wuybjmxbb2b4ngbosntqj7y3c3vtdois

xinaxu commented 1 year ago

The checker tool reports healthy stats. Willing to support

xinaxu commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea2vawpga2nwzlc42twlijzjyg5ppzfqo4ztn36djxmtkj76f442m

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.34PiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

d858ee79-a0dc-491a-8fc5-d644496bc544

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea2vawpga2nwzlc42twlijzjyg5ppzfqo4ztn36djxmtkj76f442m

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 6

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

1.03TiB

Id

facc3b4c-2935-4f3d-b37c-ece8056f82f6

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Last two approvers

xinaxu & flyworker

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

1.03TiB

Total DataCap granted for client so far

6.46PiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-1646540652827115B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
207268 33 1.34PiB 15.07 181.15TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

Storage Provider Distribution

The below table shows the distribution of storage providers that have stored data for this client.

If this is the first time a provider takes verified deal, it will be marked as new.

For most of the datacap application, below restrictions should apply.

⚠️ 28.25% of total deal sealed by f01208189 are duplicate data.

⚠️ 23.03% of total deal sealed by f01208803 are duplicate data.

Provider Location Total Deals Sealed Percentage Unique Data Duplicate Deals
f01157271 Melbourne, Victoria, AU
Anycast Global Backbone
71.45 TiB 1.16% 69.67 TiB 2.49%
f01208632 Melbourne, Victoria, AU
Anycast Global Backbone
67.43 TiB 1.09% 66.12 TiB 1.95%
f01208189 Melbourne, Victoria, AU
Anycast Global Backbone
64.65 TiB 1.05% 46.38 TiB 28.25%
f01156975 Melbourne, Victoria, AU
Anycast Global Backbone
59.94 TiB 0.97% 49.66 TiB 17.15%
f01156901 Melbourne, Victoria, AU
Anycast Global Backbone
58.89 TiB 0.95% 52.11 TiB 11.51%
f01208803 Melbourne, Victoria, AU
Anycast Global Backbone
55.10 TiB 0.89% 42.41 TiB 23.03%
f01157018 Melbourne, Victoria, AU
Anycast Global Backbone
52.77 TiB 0.85% 51.20 TiB 2.96%
f01157027 Melbourne, Victoria, AU
Anycast Global Backbone
47.07 TiB 0.76% 45.35 TiB 3.65%
f01157249 Melbourne, Victoria, AU
Anycast Global Backbone
42.92 TiB 0.69% 42.05 TiB 2.04%
f01208154 Melbourne, Victoria, AU
Anycast Global Backbone
18.90 TiB 0.31% 18.84 TiB 0.33%
f01156835 Melbourne, Victoria, AU
Anycast Global Backbone
17.88 TiB 0.29% 17.26 TiB 3.49%
f01156538 Melbourne, Victoria, AU
Anycast Global Backbone
15.26 TiB 0.25% 15.26 TiB 0.00%
f022352 Oslo, Oslo, NO
Blix Solutions AS
77.31 TiB 1.25% 70.50 TiB 8.81%
f02000937 Chengdu, Sichuan, CN
China Mobile Communications Group Co., Ltd.
446.48 TiB 7.22% 446.48 TiB 0.00%
f01915033 Chengdu, Sichuan, CN
China Mobile Communications Group Co., Ltd.
94.50 TiB 1.53% 94.50 TiB 0.00%
f02026193new Chengdu, Sichuan, CN
China Mobile Communications Group Co., Ltd.
82.00 TiB 1.33% 82.00 TiB 0.00%
f01345523 Antwerpen, Flanders, BE
Cogent Communications
9.71 TiB 0.16% 9.71 TiB 0.00%
f01972376new Maywood Park, Oregon, US
Flexential Colorado Corp.
962.98 TiB 15.58% 962.32 TiB 0.07%
f01972364new Maywood Park, Oregon, US
Flexential Colorado Corp.
926.84 TiB 14.99% 926.84 TiB 0.00%
f01971600 Dallas, Texas, US
Flexential Colorado Corp.
496.90 TiB 8.04% 496.90 TiB 0.00%
f01992630 Dallas, Texas, US
Flexential Colorado Corp.
484.57 TiB 7.84% 484.57 TiB 0.00%
f01952350 Maywood Park, Oregon, US
Flexential Colorado Corp.
238.13 TiB 3.85% 236.38 TiB 0.73%
f01944347 Maywood Park, Oregon, US
Flexential Colorado Corp.
226.63 TiB 3.67% 226.63 TiB 0.00%
f02031042new Maywood Park, Oregon, US
Flexential Colorado Corp.
98.30 TiB 1.59% 98.30 TiB 0.00%
f01392893 Amsterdam, North Holland, NL
Fusix Networks B.V.
282.71 TiB 4.57% 282.71 TiB 0.00%
f01907545 Hong Kong, Central and Western, HK
HK Broadband Network Ltd.
86.88 TiB 1.41% 86.88 TiB 0.00%
f01889910 Phoenix, Arizona, US
Level 3 Parent, LLC
28.12 TiB 0.45% 28.12 TiB 0.00%
f01847751 Denver, Colorado, US
Level 3 Parent, LLC
11.05 TiB 0.18% 11.05 TiB 0.00%
f01199430 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
582.29 TiB 9.42% 575.54 TiB 1.16%
f01786387 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
197.76 TiB 3.20% 193.13 TiB 2.34%
f01201327 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
134.84 TiB 2.18% 134.84 TiB 0.00%
f01771403 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
77.05 TiB 1.25% 77.05 TiB 0.00%
f01937642 Heerhugowaard, North Holland, NL
Wijnand Schouten trading as Speedium
64.63 TiB 1.05% 61.50 TiB 4.84%

Provider Distribution

Deal Data Replication

The below table shows how each many unique data are replicated across storage providers.

⚠️ 42.93% of deals are for data replicated across less than 4 storage providers.

Unique Data Size Total Deals Made Number of Providers Deal Percentage
1.39 PiB 1.39 PiB 1 22.99%
257.08 TiB 514.56 TiB 2 8.32%
239.33 TiB 718.17 TiB 3 11.62%
209.59 TiB 840.82 TiB 4 13.60%
262.19 TiB 1.30 PiB 5 21.59%
119.94 TiB 734.61 TiB 6 11.88%
41.16 TiB 303.53 TiB 7 4.91%
25.03 TiB 217.09 TiB 8 3.51%
9.75 TiB 91.41 TiB 9 1.48%
544.00 GiB 5.47 TiB 10 0.09%

Replication Distribution

Deal Data Shared with other Clients

The below table shows how many unique data are shared with other clients. Usually different applications owns different data and should not resolve to the same CID.

However, this could be possible if all below clients use same software to prepare for the exact same dataset or they belong to a series of LDN applications for the same dataset.[^3]

⚠️ CID sharing has been observed.

Other Client Application Total Deals Affected Unique CIDs Approvers
f1z7jogzx4x42wtilzb4lu6iotlad5rptt2acbzpi Speedium network 44.17 TiB 1,341 1flyworker
1kernelogic
4MegTei
2psh0691
3Reiers
3s0nik42

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

C00kies77 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01208189: 25.23%, f01208803: 21.07%

Deal Data Replication

⚠️ 39.87% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

kernelogic commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebaaufkyoezd3zscxb62cnnvu2ki4wswsbhgwfb5cwoyyq3xvqxqc

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.03TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

facc3b4c-2935-4f3d-b37c-ece8056f82f6

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebaaufkyoezd3zscxb62cnnvu2ki4wswsbhgwfb5cwoyyq3xvqxqc

kernelogic commented 1 year ago

This LDN looks bugged

herrehesse commented 1 year ago

LDN shares the same address ass multiple (2/27) and seems bugged indeed. Can anyone assist and maybe resolve the bot issues? Would be extremely inconvenient to have delays due to the fact that this application shares its wallet with multiple of the same applications. @raghavrmadya @xinaxu @simonkim0515 @Sunnyiscoming

NiwanDao commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebjo64ev5h3efbzjtv32ejxhsymwmoviue4h37dsujdvtr5azrt2g

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.03TiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Id

facc3b4c-2935-4f3d-b37c-ece8056f82f6

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebjo64ev5h3efbzjtv32ejxhsymwmoviue4h37dsujdvtr5azrt2g

herrehesse commented 1 year ago

@xingjitansuo & @kernelogic thank you for singing. Let’s see what will happen with this allocation or maybe application 1553 now unlocks?

Tom-OriginStorage commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01208189: 23.59%, f01208803: 20.81%

Deal Data Replication

⚠️ 39.50% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 7

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

10.24GiB

Id

9c94b5ff-3bed-4207-ae4f-f21452589213

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Last two approvers

xingjitansuo & kernelogic

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

10.24GiB

Total DataCap granted for client so far

12.55PiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-8502116598289530B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
393066 61 1.03TiB 7.94 4.21GiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01208189: 20.70%, f01208803: 20.66%

Deal Data Replication

⚠️ 35.15% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

data-programs commented 1 year ago
KYC

This user’s identity has been verified through filplus.storage