filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Speedium - NIH NCBI Sequence Read Archive [3 / 27] #1553

Closed cryptowhizzard closed 1 year ago

cryptowhizzard commented 1 year ago

Data Owner Name

NIH - National Institute of Health

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://www.nih.gov/

Social Media

https://www.facebook.com/nih.gov/

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Custom multisig

Identifier

No response

Share a brief history of your project and organization

Since its launch, the Filecoin network has become an important player in the decentralised storage space, offering a secure and transparent alternative to traditional data storage solutions.

We as Speedium / DCENT have been engaged with storing real and valuable datasets on the Filecoin network since Slingshot 2.6 and have been actively developing tools to improve the process. We are always on the lookout for new and useful client data to onboard.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

NIH NCBI Sequence Read Archive (SRA) on AWS
The Sequence Read Archive (SRA), produced by the [National Center for Biotechnology Information (NCBI)](https://www.ncbi.nlm.nih.gov/) at the [National Library of Medicine (NLM)](http://nlm.nih.gov/) at the [National Institutes of Health (NIH)](http://www.nih.gov/), stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, singularity, graphsplit

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/ncbi-sra/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Yearly

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS, Shipping hard drives, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Big data exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

MinerID City Continent Business/Entity
f01944347 Oregon USA Jenny, Dabai
f01952350 Oregon USA Jenny, Dabai
f01972364 Oregon USA Jenny, Dabai
f01972376 Oregon USA Jenny, Dabai
f02000937 Chengdu CN MTY
f01915033 Chengdu CN MTY
f0120**** Melbourne AU HOLON
f0115**** Melbourne AU HOLON
f01199430 Heerhugowaard EU DCENT
f01786387 Heerhugowaard EU DCENT
f01201327 Heerhugowaard EU DCENT
f01937642 Heerhugowaard EU DCENT
f0198**** Dallas USA GREATERHEAT
f0188**** Singapore AS GREATERHEAT
f01091851 Omaha USA DLTx
f01736668 Omaha USA DLTx
f01820744 Omaha USA DLTx
f0855584 Omaha USA DLTx
f01794610 Omaha USA DLTx
f01838599 Kansas City USA DLTx
f01845552 Kansas City USA DLTx

How do you plan to make deals to your storage providers

Boost client, Lotus client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

cryptowhizzard commented 1 year ago

(Proposal https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1512 is broken this is a re-apply of that request)

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

250TiB

Id

7f32d3c6-37d6-4a39-94ab-886dffcaef51

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

phantom-rabbit commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

kernelogic commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacearqtoutyapoz7yeqdxiizcdtbi3wtz6n3j6ut26duyk7z7brpktq

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

250.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

7f32d3c6-37d6-4a39-94ab-886dffcaef51

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacearqtoutyapoz7yeqdxiizcdtbi3wtz6n3j6ut26duyk7z7brpktq

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

500TiB

Id

2e028525-ebb5-45a1-95ea-7127316077c3

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Last two approvers

kernelogic & not found

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

500TiB

Total DataCap granted for client so far

4.51PiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

499.28TiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
144091 25 250TiB 21.67 55.54TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

NiwanDao commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01208189: 25.13%, f01208803: 20.82%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

NiwanDao commented 1 year ago

Do you allow one SP to store the same file twice?

herrehesse commented 1 year ago

@xingjitansuo good question! No we never allow it, we already promptly asked HOLON for an answer and they professionally responded in the below thread:

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1554#issuecomment-1415715644

Since then it did not occur again. The CID sharing has also been answered by @cryptowhizzard on application #1554 and it is less than <1% of the data.

NiwanDao commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebvb4t5ilhcgbvhel4wjvq7zakslpywrkguoq275yxxcr37wtips2

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

500.00TiB

Signer Address

f1a2lia2cwwekeubwo4nppt4v4vebxs2frozarz3q

Id

2e028525-ebb5-45a1-95ea-7127316077c3

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebvb4t5ilhcgbvhel4wjvq7zakslpywrkguoq275yxxcr37wtips2

Destore2023 commented 1 year ago

The CID sharing less than <1% of the dataset, more SP cooperate, and data are more decentralized

Sharding parts have also been answered by Mr. Wijnand Schouten on the application https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1554

Destore2023 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacectlvdk2eqqw5h5k5chzpqgtwvbdlyetyymtngokgdkrfai44g5vs

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

500.00TiB

Signer Address

f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi

Id

2e028525-ebb5-45a1-95ea-7127316077c3

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacectlvdk2eqqw5h5k5chzpqgtwvbdlyetyymtngokgdkrfai44g5vs

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01208189: 23.68%, f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

psh0691 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01208189: 23.59%, f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

herrehesse commented 1 year ago

Any reason why the bot would not trigger? @simonkim0515 @Sunnyiscoming @raghavrmadya

dkkapur commented 1 year ago

@herrehesse bot isn't triggered because the client address is currently used across a few different allocations, so according to it's math, it is claiming that the current DataCap balance is >25% of the previous allocation, so the logic seems to be breaking. Generally - the current advice is to use different client addresses per application so we can track this more easily and ensure everyone is on track.

See this for details:

How are you accounting on your end in terms of "readiness" for additional DataCap? I see 61 TiB remaining overall, and that is definitely <25% of the last 500 TiB allocation, so AFAIK this should be triggered.

I can go ahead and trigger this manually right now. Ideally we do this with a different client address to minimize future confusion. @herrehesse is it possible to use a different address or you prefer using this one and we work around it as needed? In the future, we'd like to support >5 PiB applications so hopefully this problem goes away to some extent anyway.

herrehesse commented 1 year ago

Please trigger manually for now @dkkapur

dkkapur commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

1000TiB

Id

2e028525-ebb5-45a1-95ea-7127316077c4

xinaxu commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceawordedppax5sktqjwq5d36jnyq67uuzpzykftx2rrxkg4s3iagu

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1000.00TiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

2e028525-ebb5-45a1-95ea-7127316077c4

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceawordedppax5sktqjwq5d36jnyq67uuzpzykftx2rrxkg4s3iagu

psh0691 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedbbnl7t6tguyyaqufrh23bwoswnujimytbdzrwhu5i2dlfun3kfg

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1000.00TiB

Signer Address

f1qdko4jg25vo35qmyvcrw4ak4fmuu3f5rif2kc7i

Id

2e028525-ebb5-45a1-95ea-7127316077c4

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedbbnl7t6tguyyaqufrh23bwoswnujimytbdzrwhu5i2dlfun3kfg

dkkapur commented 1 year ago

down to ~16% of allocation. flagged by client in Slack. triggering manually since subsequent allocation auto-trigger will not happen here as client is reusing the address across applications.

dkkapur commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

2000TiB

Id

2e028525-ebb5-45a1-95ea-7127316077c5

mjroddy commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebqvrj6vzayggpwabnv5pglvxdrih7rd7t4pv2sqyfah5ynmf5khy

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.95PiB

Signer Address

f1ystxl2ootvpirpa7ebgwl7vlhwkbx2r4zjxwe5i

Id

2e028525-ebb5-45a1-95ea-7127316077c5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebqvrj6vzayggpwabnv5pglvxdrih7rd7t4pv2sqyfah5ynmf5khy

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacea7ugl6wz4vkolg4uzo7q5nez6gthx476r4oahqvty7lxvxfgbfh4

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.95PiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

2e028525-ebb5-45a1-95ea-7127316077c5

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacea7ugl6wz4vkolg4uzo7q5nez6gthx476r4oahqvty7lxvxfgbfh4

stcloudlisa commented 1 year ago

It seems that this is not connectable

stcloudlisa commented 1 year ago
WX20230227-164143@2x
herrehesse commented 1 year ago

@stcouldlisa thank you for letting us know. I have contacted the corresponding SP (Holon) to take measured and resolve it. Please check again tomorrow, the miner should be reachable by then.

jhookersyd commented 1 year ago

We are doing some port/Telenet/firewall testing tonight on f01157271. It will be ok tomorrow. J

stcloudlisa commented 1 year ago

Ok thanks, looking forward to seeing your improvements

SBudo commented 1 year ago

I can reach that minerID from the "internet" and our node too: image

image

image

image

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 5

Multisig Notary address

f02049625

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

1.33PiB

Id

b58f5325-9429-451e-ab0b-f174300b7826

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01858410

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Last two approvers

kernelogic & megtei

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

1.33PiB

Total DataCap granted for client so far

11.22PiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-7004669722188840B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
357565 59 2000TiB 8.73 464.40TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01208189: 21.63%, f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

xinaxu commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecx3tzpw2nvfsbfzya24iy2jpnbjwgxwyh5optqhf5dopldg2xmri

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.33PiB

Signer Address

f1k3ysofkrrmqcot6fkx4wnezpczlltpirmrpsgui

Id

b58f5325-9429-451e-ab0b-f174300b7826

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecx3tzpw2nvfsbfzya24iy2jpnbjwgxwyh5optqhf5dopldg2xmri

NDLABS-Leo commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 2 storage providers sealed too much duplicate data - f01208189: 21.63%, f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

NDLABS-Leo commented 1 year ago

2 storage providers sealed too much duplicate data - f01208189: 21.63%, f01208803: 20.81%

-- I think it is normal that the proportion of nodes does not exceed 25%, because the storage speed of different nodes is inconsistent, and the proportion may be adjusted. But according to Dcent's performance in the community, I believe you can do better.

NDLABS-Leo commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceahhefokfslzk6rrf5ox2kkar3gxqhq2f6mx3hyhoa5qsvsxks32m

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.33PiB

Signer Address

f1yayfsv6whu3rheviucvventj3y6t542xfpb47ei

Id

b58f5325-9429-451e-ab0b-f174300b7826

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceahhefokfslzk6rrf5ox2kkar3gxqhq2f6mx3hyhoa5qsvsxks32m