filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] Distributed Archives for Neurophysiology Data Integration (DANDI)[1/4] #1556

Closed TijsStoker closed 1 year ago

TijsStoker commented 1 year ago

Data Owner Name

Massachusetts Institute Of Technology

Data Owner Country/Region

United States

Data Owner Industry

Life Science / Healthcare

Website

https://www.dandiarchive.org/

Social Media

https://github.com/dandi
https://twitter.com/dandiarchive

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

Custom multisig

Identifier

No response

Share a brief history of your project and organization

DANDI is a platform for publishing, sharing, and processing neurophysiology data funded by the BRAIN Initiative.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

DANDI is a public archive of neurophysiology datasets, including raw and processed data, and associated software containers. Datasets are shared according to a Creative Commons CC0 or CC-BY licenses. The data archive provides a broad range of cellular neurophysiology data. This includes electrode and optical recordings, and associated imaging data using a set of community standards: NWB:N - NWB:Neurophysiology, BIDS - Brain Imaging Data Structure, and NIDM - Neuro Imaging Data Model.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

lotus

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

https://registry.opendata.aws/dandiarchive/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

2 to 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, South America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server, IPFS, Shipping hard drives

How do you plan to choose storage providers

Slack, Big data exchange

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

No response

How do you plan to make deals to your storage providers

Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

simonkim0515 commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

DataCap allocation requested

250TiB

Id

835cdf50-d0bc-4460-a84a-360cd7cfebe2

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

There is no previous allocation for this issue.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

cryptowhizzard commented 1 year ago

See #1559

Destore2023 commented 1 year ago

Hello Applicant, I noticed your DM on slack. As above you mentioned that you will use the BDE platform to find SPs. How is it going now? In addition, BTW,I want to know who you are communicating with on the BDE platform?

TijsStoker commented 1 year ago

@swatchliu Appreciate for your attention! We have been contacting eligible SPs both online and offline. Of course, we prefer to choose BDE as the platform for contacting SP. We have already communicated with Blue Cloud and Jetfil about the initial cooperation, at this stage, and they are not yet ready to show their nodes. We mainly used to contact Jack of BDE, but his work seems to have some changes recently.

herrehesse commented 1 year ago

@swatchliu @TijsStoker This set is already stored on chain. No support here. https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1559

TijsStoker commented 1 year ago

@herrehesse Is it just because the dataset has been applied for datacap by Speedium? I think that should not be the reason for the community to reject other applicants.

Destore2023 commented 1 year ago

@swatchliu @TijsStoker This set is already stored on chain. No support here. #1559

Thanks for your comments, Hidde. @herrehesse I think this is a positive client growth use case. no matter whether this dataset is already stored on chain or not. We can encourage them to do so.

BTW, you can see the latest LDN from 1637-1644, it's the same situation there, I will continue to support those trustable DP

Destore2023 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacec3u3orjw6cqh3hkosfngoaycbyu3h6u4n2hxfa5rax76b5bfgxfm

Address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

Datacap Allocated

250.00TiB

Signer Address

f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi

Id

835cdf50-d0bc-4460-a84a-360cd7cfebe2

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacec3u3orjw6cqh3hkosfngoaycbyu3h6u4n2hxfa5rax76b5bfgxfm

cryptowhizzard commented 1 year ago

Hi Eric,

I am interested in signing this application. Can you share the DD you have done here? Have you performed KYC and do you know @TijsStoker? The SP's are sourced with BDE right? Have you explained the FIL+ rules about distribution / retrievability to @TijsStoker ?

Thanks!

TijsStoker commented 1 year ago

@cryptowhizzard Yes, we've explained that only SPs who follow the rules will be given the opportunity to work together consistently.

Destore2023 commented 1 year ago

Hi Eric,

I am interested in signing this application. Can you share the DD you have done here? Have you performed KYC and do you know @TijsStoker? The SP's are sourced with BDE right? Have you explained the FIL+ rules about distribution/retrievability to @TijsStoker ?

Thanks!

Hi Wijnand,

Yes, a clear rule of FIL+ was shared with the applicant. He can meet the 4 SPs for distribution but it's hard for him to deliver data to 4 different continents. But I'd like to give him a thumb up for the first allocation. As you mentioned on Slack, If he messed up his first 250TiB we won't give him another chance.

MetaWaveInfo commented 1 year ago

Sound good ,willing to support!

cryptowhizzard commented 1 year ago

Splended!

Just for the reference:

Did you read https://github.com/filecoin-project/notary-governance/issues/819?

Lack of diligence from notaries: Of all the recent conversations and arguments of potential bad acting, this is one that the Filecoin Plus program may have the most opportunity to address and correct. Each notary is required to submit their own set of guidelines, policies, and procedures for how they will perform diligence and award DataCap. What we are seeing in some cases is a total lack of diligence on large dataset applications, with subsequent allocations happening possibly just because “someone else already signed once.” The policy has always been: ask a client to provide qualitative up-front evidence to justify they are real, have valid data, and can be trusted to engage in good deal-making; then perform subsequent quantitative analysis on their on-chain deal-making data. The governance team has prioritized tooling to make this process easier, simpler, and more consistent for the notaries to perform their stated duties. It is not the role of the governance team to perform diligence on clients or storage providers. We intend to continue prioritizing this, and look forward to hearing suggestions on additional services or tools that can be built, whether it is by us or by the community.

"The policy has always been: ask a client to provide qualitative up-front evidence to justify they are real, have valid data, and can be trusted to engage in good deal-making; then perform subsequent quantitative analysis on their on-chain deal-making data. "

MetaWaveInfo commented 1 year ago

Looking forward to your first milestone. @TijsStoker

MetaWaveInfo commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceafbn2gqgkryzvgivl2fsyll2giemssa7hop7gwbhwehtl2wh2pzu

Address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

Datacap Allocated

250.00TiB

Signer Address

f1ktlkcxnmzxcdaoqfsunrg3vocfbmgv4n3mrn74a

Id

835cdf50-d0bc-4460-a84a-360cd7cfebe2

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceafbn2gqgkryzvgivl2fsyll2giemssa7hop7gwbhwehtl2wh2pzu

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

DataCap allocation requested

500TiB

Id

c8fe06a4-2f42-40f9-a003-aacc171fd698

UnionLabs2020 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceawfy6x4gwfhh4hmwxex7zaygwsftfdut7dywqlxm2sjgwor2iace

Address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

Datacap Allocated

500.00TiB

Signer Address

f17xdri3wunqgld7dm23e4f3eqsntjakwc47xjo6i

Id

c8fe06a4-2f42-40f9-a003-aacc171fd698

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceawfy6x4gwfhh4hmwxex7zaygwsftfdut7dywqlxm2sjgwor2iace

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
lvschouwen commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 3 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

DataCap allocation requested

1000.0TiB

Id

c5594208-849b-4c27-94c0-44fe69a20c0d

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 100.00% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

Casey-PG commented 1 year ago

Looks good. Could you help to explain why there are still flaws and how to improve it?

TijsStoker commented 1 year ago

Hello @PangodGroup I have contacted with our technicians about it. This happened when our technicians were doing retrieval tests in the community, and accidentally mixed the retrieval files they previously downloaded for testing into our data for this application. Then we will ask our technicians to examine our buckets, separate and sort these data. Make it not happen again.

Hope to get your support!

cryptowhizzard commented 1 year ago

Good to hear that your technician can retrieve things no-one can in the community from LDN's full for abuse and other garbage. Ohhhh wait......

Destore2023 commented 1 year ago

In addition to the extremely low proportion of defects, generally it's OK. Please continue to work hard and hope to see better results in the next round.

Destore2023 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzaceb4qtgarn42k52yoc7x55ihkobq3gmdc2uagh3aga2rmjn67lu372

Address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

Datacap Allocated

1000.00TiB

Signer Address

f1yh6q3nmsg7i2sys7f7dexcuajgoweudcqj2chfi

Id

c5594208-849b-4c27-94c0-44fe69a20c0d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb4qtgarn42k52yoc7x55ihkobq3gmdc2uagh3aga2rmjn67lu372

Casey-PG commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacebyxo7y32loebhzdjqbfxadtjpz4k2yz7lmbhnlwrv4g4j3cx3j6i

Address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

Datacap Allocated

1000.00TiB

Signer Address

f1d4yb3wags3mtddzesxoo63jv7dmlec3bq4yteni

Id

c5594208-849b-4c27-94c0-44fe69a20c0d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebyxo7y32loebhzdjqbfxadtjpz4k2yz7lmbhnlwrv4g4j3cx3j6i

TakiChain commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceahqrh5myyaoh5hc5ksrkm72ahkxcftuhsldsld66yzyvuzv2f3qi

Address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

Datacap Allocated

1000.00TiB

Signer Address

f15impf3j2zcaex4lhyxndxswuuhv24vzstuqtxsi

Id

c5594208-849b-4c27-94c0-44fe69a20c0d

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceahqrh5myyaoh5hc5ksrkm72ahkxcftuhsldsld66yzyvuzv2f3qi

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 5

Multisig Notary address

f02049625

Client address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

DataCap allocation requested

2.31PiB

Id

91bf525e-9ad7-4185-84ed-2ee5d1c042be

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1terdmbfjdrpwih67jxx5bnltyocvnf5qviww3sa

Rule to calculate the allocation request amount

800% of weekly dc amount requested

DataCap allocation requested

2.31PiB

Total DataCap granted for client so far

9.094947017729283e+36YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

-1.09B

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
7785 12 1000.0TiB 27.44 254.71TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 43.53% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

⚠️ 43.53% of deals are for data replicated across less than 4 storage providers.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f02033556: 29.67%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

cryptowhizzard commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f02033556: 29.67%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

TijsStoker commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f02033556: 29.67%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

Chris00618 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

⚠️ 1 storage providers sealed too much duplicate data - f02033556: 29.67%

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

⚠️ CID sharing has been observed. (Top 3)

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

TijsStoker commented 1 year ago

Yes

cryptowhizzard commented 1 year ago

This client is actively stalling http retrievals and blocked http ranged requests with a reverse proxy to prevent it's data being investigated.

It works as follows:

One set's a bandwidth limit with NGINX on the HTTP retrieval. After a random certain amount the limit is set to zero. This makes the transfer timeout. Because range retrieval is disabled in NGINX one cannot pick up where he left and needs to start all over again.

Log can be found at http://datasetcreators.com/downloadedcarfiles/logs/1556.log

ghost commented 1 year ago

Hello @TijsStoker per the new guidelines https://github.com/filecoin-project/notary-governance/issues/922 for Open Dataset applicants, please complete the following Fil+ registration form to identify yourself as the applicant and also please add the contact information of the SP entities you are working with to store copies of the data.

This information will be reviewed by Fil+ Governance team to confirm validity toward the Fil+ guideline of a distributed storage plan and SPs posted in the comments here. Let us know if you have any questions.