filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
109 stars 62 forks source link

[DataCap Application] Speedium - NIH NCBI Sequence Read Archive [FULL SET] #2008

Closed cryptowhizzard closed 11 months ago

cryptowhizzard commented 1 year ago

Data Owner Name

NIH - National Institute of Health

What is your role related to the dataset

Data Preparer

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://www.nih.gov/

Social Media

https://www.facebook.com/nih.gov/

Total amount of DataCap being requested

120 PiB

Expected size of single dataset (one copy)

15 PiB

Number of replicas to store

10

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

efil

Share a brief history of your project and organization

Since its launch, the Filecoin network has become an important player in the decentralised storage space, offering a secure and transparent alternative to traditional data storage solutions.

We as Speedium / DCENT have been engaged with storing real and valuable datasets on the Filecoin network since Slingshot 2.6 and have been actively developing tools to improve the process. We are always on the lookout for new and useful client data to onboard.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

NIH NCBI Sequence Read Archive (SRA) on AWS
The Sequence Read Archive (SRA), produced by the [National Center for Biotechnology Information (NCBI)](https://www.ncbi.nlm.nih.gov/) at the [National Library of Medicine (NLM)](http://nlm.nih.gov/) at the [National Institutes of Health (NIH)](http://www.nih.gov/), stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/ncbi-sra/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Monthly

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Big Data Exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

| MinerID | City | Continent | Business/Entity |
| --- | --- | --- | --- |
| `f01944347` | Oregon | USA | Jenny, Dabai |
| `f01952350` | Oregon | USA | Jenny, Dabai |
| `f01972364` | Oregon | USA | Jenny, Dabai |
| `f01972376` | Oregon | USA | Jenny, Dabai |
| `f02000937` | Chengdu | CN | MTY |
| `f01915033` | Chengdu | CN | MTY |
| `f0120****` | Melbourne | AU | HOLON |
| `f0115****` | Melbourne | AU | HOLON |
| `f01199430` | Heerhugowaard | EU | DCENT |
| `f01786387` | Heerhugowaard | EU | DCENT |
| `f01201327` | Heerhugowaard | EU | DCENT |
| `f01937642` | Heerhugowaard | EU | DCENT |
| `f0198****` | Dallas | USA | GREATERHEAT |
| `f0188****` | Singapore | AS | GREATERHEAT |
| `f01091851` | Omaha | USA | DLTx |
| `f01736668` | Omaha | USA | DLTx |
| `f01820744` | Omaha | USA | DLTx |
| `f0855584` | Omaha | USA | DLTx |
| `f01794610` | Omaha | USA | DLTx |
| `f01838599` | Kansas City | USA | DLTx |
| `f01845552` | Kansas City | USA | DLTx |

How do you plan to make deals to your storage providers

No response

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

dikemm commented 1 year ago

Still under investigation because of collusion between Dcent and fildrive.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

kenzz45 commented 1 year ago

Tagging @jamerduhgamer with signing this abusive application, Ignoring warnings from community members!

1、too much duplicate data 2、CID sharing within totally different datasets.

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/339

herrehesse commented 1 year ago
Screenshot 2023-07-23 at 19 01 54

@kenzz45 Gaslighting is a serious tactic used by entities like yourself. For people not familiar with it here is a link: https://en.wikipedia.org/wiki/Gaslighting

hcgun commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

zcfil commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

XnMatrixSV commented 1 year ago

Hi I would like to understand more about the duplication of data at this node f01208803: 20.81%.

Otherwise I'm happy to support - with one additional doubt: some data from the previous batch has been stored at one of our data centers in Portland before. | f01944347 | Oregon | USA | Jenny, Dabai | | f01952350 | Oregon | USA | Jenny, Dabai | | f01972364 | Oregon | USA | Jenny, Dabai | | f01972376 | Oregon | USA | Jenny, Dabai |

Can we still support this in this case? @kevzak

jhookersyd commented 1 year ago

@XnMatrixSV The explanation is above in the thread. Thanks

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2008#issuecomment-1567958183

herrehesse commented 1 year ago

@XnMatrixSV,

Our collaboration on NiH came to an end in early March, approximately 4 months ago. We highly value your transparency, and such conduct should be emulated by more notaries. Let's seek @kevzak's approval to proceed further.

kevzak commented 1 year ago

Hello - The best guidelines I found available were here: https://github.com/filecoin-project/notary-governance/issues/825#issuecomment-1438606168

It states that as long as the active notary is not working with the client, there is no conflict. If as was mentioned above, XnMatrix work with the client is terminated and is not a part of the current allocation, then I don't see a direct issue.

XnMatrixSV commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedzzgrrdgdv4c2djqfnzrqhdscrflsvrjc43qsmlhpmyqvtnxucmo

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

2.00PiB

Signer Address

f1bcvvwv3w6az7ivhdzory7anha54ocrlkxazm3yq

Id

b4b9980c-26c8-4ce7-8e99-ca0cb383bd39

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedzzgrrdgdv4c2djqfnzrqhdscrflsvrjc43qsmlhpmyqvtnxucmo

lyjmry commented 1 year ago

The LDN remains controversial

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

herrehesse commented 1 year ago

Thanks for reminding us of inactivity @bot. The application can remain open.

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

data-programs commented 1 year ago
KYC

This user’s identity has been verified through filplus.storage

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 41

Multisig Notary address

f01940930

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

2PiB

Id

48135b63-7e81-47be-bcd4-0c3aa43299f3

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01940930

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Rule to calculate the allocation request amount

400% weekly > 2PiB, requesting 2PiB

DataCap allocation requested

2PiB

Total DataCap granted for client so far

InfinityYiB

Datacap to be granted to reach the total amount requested by the client (120 PiB)

InfinityYiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
222100 54 2PiB 14.44 394.26TiB
sxxfuture-official commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

sxxfuture-official commented 1 year ago

@cryptowhizzard f01208803 sealed too much duplicate data, hope to pay attention to and solve this problem in the later stage, other aspects look OK

herrehesse commented 1 year ago

@sxxfuture-official Hey there! Thanks for getting back. It seems there might've been an oversight regarding the duplicate data explanation - it's been addressed around 10+ times previously. No worries, it's easy to miss. For clarity, here's the detailed explainer: https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2008#issuecomment-1567958183

Appreciate your understanding and support!

liyunzhi-666 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

NiwanDao commented 1 year ago

Can you please explain the CID sharing? @cryptowhizzard

cryptowhizzard commented 1 year ago

Where?

herrehesse commented 1 year ago

@NiwanDao Deal Data Replication ✔️ Data replication looks healthy.

Joss-Hua commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

liyunzhi-666 commented 1 year ago

checker:manualTrigger

liyunzhi-666 commented 1 year ago

checker:manualTrigger

herrehesse commented 1 year ago

@simonkim0515 @fabriziogianni7 Hi guys, can you check why the trigger is not working?

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

-- Commented by Stale Bot.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 14 days, so for now it is being closed. Please feel free to contact the Fil+ Gov team to re-open the application if it is still being processed. Thank you!

-- Commented by Stale Bot.

herrehesse commented 1 year ago

@simonkim0515 @kevzak How to proceed on this one?

herrehesse commented 1 year ago

checker:manualTrigger

herrehesse commented 1 year ago

@simonkim0515 Can you unluck the next 2P trench for us?

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard.

simonkim0515 commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f01940930

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

2PiB

Id

d6ee8282-9ba0-4cc2-8bdc-1f7dea6db97d

herrehesse commented 1 year ago

Anyone able to sign?

@Fatman13 @flyworker @jamerduhgamer @laurarenpanda @liyunzhi-666 @newwebgroup @stcouldlisa @XnMatrixSV

jamerduhgamer commented 1 year ago

SP Distribution warning is barely above 20%. Healthy deal data replication. Low amount of CID sharing.

Willing to support this application.

jamerduhgamer commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceash3hgmk46m53szosrcgn5msxszqjcpmcggnsplmytpfg2swxj54

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

2.00PiB

Signer Address

f1ypuqpi4xn5q7zi5at3rmdltosozifhqmrt66vhq

Id

d6ee8282-9ba0-4cc2-8bdc-1f7dea6db97d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceash3hgmk46m53szosrcgn5msxszqjcpmcggnsplmytpfg2swxj54

PluskitOfficial commented 1 year ago

checker:manualTrigger