filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Speedium - NIH NCBI Sequence Read Archive [FULL SET] #2008

Closed cryptowhizzard closed 9 months ago

cryptowhizzard commented 1 year ago

Data Owner Name

NIH - National Institute of Health

What is your role related to the dataset

Data Preparer

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://www.nih.gov/

Social Media

https://www.facebook.com/nih.gov/

Total amount of DataCap being requested

120 PiB

Expected size of single dataset (one copy)

15 PiB

Number of replicas to store

10

Weekly allocation of DataCap requested

1PiB

On-chain address for first allocation

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

Identifier

efil

Share a brief history of your project and organization

Since its launch, the Filecoin network has become an important player in the decentralised storage space, offering a secure and transparent alternative to traditional data storage solutions.

We as Speedium / DCENT have been engaged with storing real and valuable datasets on the Filecoin network since Slingshot 2.6 and have been actively developing tools to improve the process. We are always on the lookout for new and useful client data to onboard.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

NIH NCBI Sequence Read Archive (SRA) on AWS
The Sequence Read Archive (SRA), produced by the [National Center for Biotechnology Information (NCBI)](https://www.ncbi.nlm.nih.gov/) at the [National Library of Medicine (NLM)](http://nlm.nih.gov/) at the [National Institutes of Health (NIH)](http://www.nih.gov/), stores raw DNA sequencing data and alignment information from high-throughput sequencing platforms.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, singularity

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/ncbi-sra/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Monthly

For how long do you plan to keep this dataset stored on Filecoin

1 to 1.5 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Big Data Exchange, Partners

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

| MinerID | City | Continent | Business/Entity |
| --- | --- | --- | --- |
| `f01944347` | Oregon | USA | Jenny, Dabai |
| `f01952350` | Oregon | USA | Jenny, Dabai |
| `f01972364` | Oregon | USA | Jenny, Dabai |
| `f01972376` | Oregon | USA | Jenny, Dabai |
| `f02000937` | Chengdu | CN | MTY |
| `f01915033` | Chengdu | CN | MTY |
| `f0120****` | Melbourne | AU | HOLON |
| `f0115****` | Melbourne | AU | HOLON |
| `f01199430` | Heerhugowaard | EU | DCENT |
| `f01786387` | Heerhugowaard | EU | DCENT |
| `f01201327` | Heerhugowaard | EU | DCENT |
| `f01937642` | Heerhugowaard | EU | DCENT |
| `f0198****` | Dallas | USA | GREATERHEAT |
| `f0188****` | Singapore | AS | GREATERHEAT |
| `f01091851` | Omaha | USA | DLTx |
| `f01736668` | Omaha | USA | DLTx |
| `f01820744` | Omaha | USA | DLTx |
| `f0855584` | Omaha | USA | DLTx |
| `f01794610` | Omaha | USA | DLTx |
| `f01838599` | Kansas City | USA | DLTx |
| `f01845552` | Kansas City | USA | DLTx |

How do you plan to make deals to your storage providers

No response

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

stcloudlisa commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacedmacne6gcumm2vckjbbtsvq3sjkv3qfcv55xh6kjwbyblvsqsj4u

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

1.95PiB

Signer Address

f1jvvltduw35u6inn5tr4nfualyd42bh3vjtylgci

Id

dac4fe66-d84e-454c-9d45-232ae75730fp

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedmacne6gcumm2vckjbbtsvq3sjkv3qfcv55xh6kjwbyblvsqsj4u

stcloudlisa commented 1 year ago

The customer contacted me on slack, I believe it will get better and better, temporarily support

kevzak commented 1 year ago

Hello @cryptowhizzard - there is now one additional step as part of E-Fil+ application process: To validate your applicant GitHub ID, we ask you to complete the KYC check (a third party ID verification process).

Steps:

Also note:

Let me know if you have any issues or questions.

cryptowhizzard commented 1 year ago

@kevzak I have done the full KYC verification and are now officially verified.

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 38

Multisig Notary address

f01940930

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

2PiB

Id

29463449-07cd-4d98-a43a-d16799e59fc9

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01940930

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Rule to calculate the allocation request amount

400% weekly > 2PiB, requesting 2PiB

DataCap allocation requested

2PiB

Total DataCap granted for client so far

InfinityYiB

Datacap to be granted to reach the total amount requested by the client (120 PiB)

InfinityYiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
73003 36 1.95PiB 20.92 479.59TiB
herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

liyunzhi-666 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

liyunzhi-666 commented 1 year ago

Retrieval success rate looks good CID checker also looks good

liyunzhi-666 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacedvyu52ox6do4vpgqxfnd3ackx2yvulkgq4hktvvebdgzfzez5eqy

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

2.00PiB

Signer Address

f1pszcrsciyixyuxxukkvtazcokexbn54amf7gvoq

Id

29463449-07cd-4d98-a43a-d16799e59fc9

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacedvyu52ox6do4vpgqxfnd3ackx2yvulkgq4hktvvebdgzfzez5eqy

herrehesse commented 1 year ago

@liyunzhi-666 Working on 100% retrievability on our own miners, once the retrieval bot gets upgraded to showcase 24H-7D-Total statistics we will collaborate with all miners who stored deals and help them to reach 100% as well.

HTTP/Bitswap are our main goal.

laurarenpanda commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacec2xhbc64bvluaek7zsjf7tpvl53sogu5ueibnncaeevppgegfzes

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

2.00PiB

Signer Address

f1bp3tzp536edm7dodldceekzbsx7zcy7hdfg6uzq

Id

29463449-07cd-4d98-a43a-d16799e59fc9

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec2xhbc64bvluaek7zsjf7tpvl53sogu5ueibnncaeevppgegfzes

kevzak commented 1 year ago

Hi @cryptowhizzard sorry to bother you again, but your KYC check did not go through. Can you try again when you get a chance? We were testing the past few weeks and now you should be able to complete and if not see the error. Once complete you'll see the KYC verified on your filplus.storage account and on the applications. Thanks

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 39

Multisig Notary address

f01940930

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

DataCap allocation requested

2PiB

Id

b4b9980c-26c8-4ce7-8e99-ca0cb383bd39

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f01940930

Client address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Rule to calculate the allocation request amount

400% weekly > 2PiB, requesting 2PiB

DataCap allocation requested

2PiB

Total DataCap granted for client so far

InfinityYiB

Datacap to be granted to reach the total amount requested by the client (120 PiB)

InfinityYiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
151322 42 2PiB 21.21 497.32TiB
herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

lyjmry commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

Patapon0702 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

newwebgroup commented 1 year ago

Hi, The #2008 copy of the data appears to be out of compliance. There are data close to 3P+ that are not replicated according to Fil+ rules. Regarding this point, Please give an explanation for this. what improvement plans are there?

image
cryptowhizzard commented 1 year ago

Hi,

One organisation is ahead with the sealing plan. We will pause them so things will balance out over the course of next tranche of datacap.

dikemm commented 1 year ago

https://github.com/filecoin-project/notary-governance/issues/930

herrehesse commented 1 year ago

@dikemm, your accusations lack any basis and have already been addressed and explained more than seven times. Your attempts to gaslight and attack the entities investigating fraud are a waste of time.

Instead, it would be more productive for you to focus your efforts on uncovering actual fraudulent applications.

herrehesse commented 1 year ago

To provide transparency, I want to reiterate that your previous accusations have been addressed and answered more than seven times in the past six months. I am more than willing to provide you with links to these responses, as it seems you may have been unable to find them yourself:

Regarding duplicate data (0.04% of the application size), the explanation can be found here: [https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2008#issuecomment-1567958183] and [https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/488#issuecomment-1410486103].

Regarding CID sharing (0.02% of the application size), it was a fault on our end six months ago, and it has not occurred since.

CID sharing 0.02% of the application was a fault at our end 6 months ago, has not happened since. Unique bytes [https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2008#issuecomment-1640145893]

I hope these resources will help clarify any misconceptions and promote a better understanding of the situation.

lyjmry commented 1 year ago

The two notaries ignored @Sunnyiscoming's question and went straight to the next round. I suspect they have something to do with the Dcent team! ! image 64c70b129174523df466fd12dabc3b2

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2008#issuecomment-1585385157

herrehesse commented 1 year ago

@lyjmry Gaslighting attempt. Waste of time.

lyjmry commented 1 year ago

@lyjmry Gaslighting attempt. Waste of time.

@lyjmry Gaslighting attempt. Waste of time.

Don't you have to explain what happened? Why do you ignore @Sunnyiscoming

herrehesse commented 1 year ago

We have repeatedly clarified that 0.02% of our application consisted of .car files from one of our other datasets. This incident occurred six months ago, and it has already been thoroughly discussed and resolved back then.

Your presence here seems to be solely aimed at generating unnecessary disturbance. We are well aware of the situation, and we kindly request that you cease these efforts.

lyjmry commented 1 year ago

e96e166f6035fc987c965baa46e6d16 3358e351bff77c33cd64db90402da59 3b4bc3c7b4f1a28c8d26bc7a127b8f0 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2008#issuecomment-1567636958

https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2008#issuecomment-1577778458 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/2008#issuecomment-1614458700

@laurarenpanda Notary violates Fil+ signing rules.

  1. Ignore @Sunnyiscoming question and sign directly.
  2. Sign twice in a row.
  3. Multiple signatures.

Do you think there is no problem with such an LDN? @raghavrmadya I wish to close this LDN. As well as the disqualification of relevant notaries who are suspect notaries (including the Dcent team).

herrehesse commented 1 year ago

@lyjmry Your presence here seems to be solely aimed at generating unnecessary disturbance. We are well aware of the situation, and we kindly request that you cease these efforts.

lyjmry commented 1 year ago

@lyjmry Your presence here seems to be solely aimed at generating unnecessary disturbance. We are well aware of the situation, and we kindly request that you cease these efforts.

You keep asking me to stop asking questions about your failure to follow Fil+ rules. But no explanation?

herrehesse commented 1 year ago

"But no explanation?"

We have provided explanations on more than eight occasions. However, if you choose not to seek answers and instead resort to yelling, feel free to proceed with your approach.

lyjmry commented 1 year ago

"But no explanation?"

We have provided explanations on more than eight occasions. However, if you choose not to seek answers and instead resort to yelling, feel free to proceed with your approach.

OK. You only explain your repeated CID sharing problem. No explanation seems to have been given for the notary question.

lyjmry commented 1 year ago

"But no explanation?"

We have provided explanations on more than eight occasions. However, if you choose not to seek answers and instead resort to yelling, feel free to proceed with your approach.

image Likewise, you can blame other notaries for their actions. But you ignore your actions.

lyjmry commented 1 year ago

Signing notaries centered on @laurarenpanda @liyunzhi-666 @Fatman13

emilytklee commented 1 year ago

https://github.com/filecoin-project/notary-governance/issues/930

herrehesse commented 1 year ago

checker:manualTrigger

emilytklee commented 1 year ago

image

This is a disputed application ,Please notaries do not sign for it

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f01208803: 20.81%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

jamerduhgamer commented 1 year ago

Willing to support next tranche of datacap on this public dataset. See comment left in the disputed application here.

@emilytklee, I have reviewed the disputed application here and have concluded that all outstanding concerns have been addressed at the time.

Will not approve of the next datacap tranche if more valid disputes are raised and unanswered.

jamerduhgamer commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacec2nb6sxq6nmweydcekztni52zbywlvxxszv67xgi6doeaea5p5ck

Address

f1mgnwoczfj25foxn4555wvwyak6rsynzy7z73azy

Datacap Allocated

2.00PiB

Signer Address

f1ypuqpi4xn5q7zi5at3rmdltosozifhqmrt66vhq

Id

b4b9980c-26c8-4ce7-8e99-ca0cb383bd39

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec2nb6sxq6nmweydcekztni52zbywlvxxszv67xgi6doeaea5p5ck