filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] FileDrive Labs - Smithsonian Open Access #1688

Closed laurarenpanda closed 7 months ago

laurarenpanda commented 1 year ago

Data Owner Name

FileDrive Labs

Data Owner Country/Region

China

Data Owner Industry

Life Science / Healthcare

Website

https://filedrive.io/

Social Media

Twitter: https://twitter.com/FileDrive1
Medium: https://medium.com/@FileDrive1
WeChat Offical Account: FileDrive

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Custom multisig

Identifier

No response

Share a brief history of your project and organization

FileDrive Datasets Landing Plan is a project for onboarding more valuable public datasets onto the Filecoin network. Through several phases, we plan to bring 10 PiB data and promote 100 PiB storage power growth to Filecoin. 

About FileDrive Datasets

FileDrive Datasets is a platform to effectively connect the huge storage market that Filecoin has built with publishers of public datasets.
The Filecoin network provides reliable, secure, and affordable decentralized storage services, and FileDrive Labs wants to deliver these benefits to end-users by building a public dataset platform.
It is challenging to attract traditional Cloud Storage and Object-base Storage users to the Filecoin network and benefit from it. Developers in the Felicoin ecosystem, such as FileDrive Labs, need to face this challenge together.
As a member of the Filecoin ecosystem, FileDrive Labs has been insisting on developing useful tools to make it easier for users to store their data onto the Filecoin network. 

FileDrive Datasets has integrated a group of tools to provide storage service with the compatibility of both Cloud Storage and Object-base Storage and better user experience to attract more users.
Projects(ongoing) behind:
- Go-Graphsplit: https://github.com/filedrive-team/go-graphsplit
- DS-Cluster: https://github.com/filedrive-team/go-ds-cluster
- Filejoy: https://github.com/filedrive-team/filejoy

Article about FileDrive Datasets on Filecoin Blog:
- Large Datasets: FileDrive: https://filecoin.io/blog/posts/large-datasets-filedrive/

About FileDrive Labs

FileDrive Labs has always defined ourselves as tool developers and infrastructure builders in the Filecoin ecosystem. From 2019, we continuously focus on technical solutions and development based on IPFS protocol and the Filecoin network and do our best to contribute to the community.
Over 80% of our team are qualified engineers, and half of them have more than 10-year development experience in multiple industries, including Communication, the Internet, and blockchain.
Since 2020, we have participated in Slingshot Competition, become one of the top teams, and stored over 5 PiB useful data from public datasets to the Filecoin network.
To contribute to the Filecoin Community, we developed an open-source data prep tool Graphsplit, FIL+ project dashboard filplus.info and storage provider discovery platform filfind,info.
Besides, we have also hold weekly online virtual events named FileDrive Meetup from March 2022, which aims to provide a platform for community members to grasp the latest trends of the Filecoin network and our work and research.

Please check the following links for more details.
- GitHub: https://github.com/filedrive-team
- Twitter: https://twitter.com/FileDrive1
- Eventbrite: https://www.eventbrite.hk/o/filedrive-labs-42456337463
- YouTube Channel: https://www.youtube.com/channel/UCxcZC1dtBUlQvZY7DX13W1w
- Medium: https://medium.com/@FileDrive1

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

Smithsonian Open Access
- The Smithsonian’s mission is the "increase and diffusion of knowledge" and has been collecting since 1846. The Smithsonian, through its efforts to digitize its multidisciplinary collections, has created millions of digital assets and related metadata describing the collection objects. On February 25th, 2020, the Smithsonian released over 2.8 million CC0 interdisciplinary 2-D and 3-D images, related metadata, and additionally, research data from researches across the Smithsonian. The 2.8 million "open access" collections are a subset of the Smithsonian’s 155 million objects, 2.1 million library volumes and 156,000 cubic feet of archival collections held in 19 museums, 9 research centers, libraries, archives and the National Zoo. Digitization of collections is ongoing.
- https://registry.opendata.aws/smithsonian-open-access/
- License: CC0
- Size: 621.2 TiB

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, graphsplit

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

Original Source:
https://registry.opendata.aws/smithsonian-open-access/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Weekly

For how long do you plan to keep this dataset stored on Filecoin

2 to 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

How will you be distributing your data to storage providers

IPFS, Shipping hard drives, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Filmine

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

Please check the Checker Reports of our previous LDN applications:
- https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1266

How do you plan to make deals to your storage providers

Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

large-datacap-requests[bot] commented 1 year ago

Thanks for your request!

Heads up, you’re requesting more than the typical weekly onboarding rate of DataCap!
large-datacap-requests[bot] commented 1 year ago

Thanks for your request! Everything looks good. :ok_hand:

A Governance Team member will review the information provided and contact you back pretty soon.

Sunnyiscoming commented 1 year ago

You mentioned this dataset in 8 applications. How much data of this dataset has been stored? How many copies? https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1266 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1267 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1268

Because there is no consensus in whether the client should submit one by one, 5 applications have not been updated. https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1623 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1624 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1625 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1626 https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1627

laurarenpanda commented 1 year ago

@Sunnyiscoming We have yet to store this dataset with the DC from #1266, #1267, and #1268 (2451.1TiB data, 15 PIB DC with 6-11 copies). So we move this one into Landing Plan V2. Since you suggested that we submit applications for each dataset by dataset, I submitted this LDN after.

Proposal 832 is still under discussion and has not been passed by the consensus from the community and Notaries. So, I am still confused about what I should do at present.

Sunnyiscoming commented 1 year ago

Datacap Request Trigger

Total DataCap requested

5PiB

Expected weekly DataCap usage rate

500TiB

Client address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Multisig Notary address

f02049625

Client address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

DataCap allocation requested

250TiB

Id

e30a5bb4-9378-4b93-a10a-d992b77021bb

Fatman13 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report[^1]

No application info found for this issue on https://filplus.d.interplanetary.one/clients.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

laurarenpanda commented 1 year ago

@Fatman13 Please check for the historical deal report: https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1266#issuecomment-1441136766

Fatman13 commented 1 year ago

What was the reason for CID sharing again? I remember seeing you explaining it somewhere but couldn't find it.

laurarenpanda commented 1 year ago

What was the reason for CID sharing again? I remember seeing you explaining it somewhere but couldn't find it.

The same public datasets with the same preprocessing tool, like Go-Graphsplit, could lead to that result.

Fatman13 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecxngctrafbtevjmqfgpanrkzaxgkxsek3kong3t6hskx4uzie7ga

Address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Datacap Allocated

250.00TiB

Signer Address

f1j3u7crhjzwb2cj5mq7vodlt4o66yoyci7lhcauy

Id

e30a5bb4-9378-4b93-a10a-d992b77021bb

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecxngctrafbtevjmqfgpanrkzaxgkxsek3kong3t6hskx4uzie7ga

liyunzhi-666 commented 1 year ago

Through disclosure records and comment history,I would like to support this round.

liyunzhi-666 commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebb56cfyjxmba45m6dseb2zyo3zfk2rwnxl22wrwxvfoqj63jhceu

Address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Datacap Allocated

250.00TiB

Signer Address

f1pszcrsciyixyuxxukkvtazcokexbn54amf7gvoq

Id

e30a5bb4-9378-4b93-a10a-d992b77021bb

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebb56cfyjxmba45m6dseb2zyo3zfk2rwnxl22wrwxvfoqj63jhceu

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 2

Multisig Notary address

f02049625

Client address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

DataCap allocation requested

500TiB

Id

ef70eb16-cee2-455a-9804-3451cfe85e1e

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Rule to calculate the allocation request amount

100% of weekly dc amount requested

DataCap allocation requested

500TiB

Total DataCap granted for client so far

250TiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

4.75PiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
5940 5 250TiB 26.94 64.37TiB
filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

Fatman13 commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecygocguw6vntkb47v5prag5hjfnfzohhhggdyah4k56tpx77qvfe

Address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Datacap Allocated

500.00TiB

Signer Address

f1j3u7crhjzwb2cj5mq7vodlt4o66yoyci7lhcauy

Id

ef70eb16-cee2-455a-9804-3451cfe85e1e

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecygocguw6vntkb47v5prag5hjfnfzohhhggdyah4k56tpx77qvfe

Fatman13 commented 1 year ago

Reached out by client on Slack. CID checker and everything else looks good.

kernelogic commented 1 year ago

In support too

NDLABS-Leo commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the full report.

kernelogic commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzacebphzcbtq3lmodbijwhohgezhxbstao6imhnxpido2nuqzklxufrk

Address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Datacap Allocated

500.00TiB

Signer Address

f1yjhnsoga2ccnepb7t3p3ov5fzom3syhsuinxexa

Id

ef70eb16-cee2-455a-9804-3451cfe85e1e

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacebphzcbtq3lmodbijwhohgezhxbstao6imhnxpido2nuqzklxufrk

laurarenpanda commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

⚠️ All retrieval success ratios are below 1%.

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 3

Multisig Notary address

f02049625

Client address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

DataCap allocation requested

1000.0TiB

Id

03f55812-bf25-4ee7-b7ba-36b69fe0d709

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Rule to calculate the allocation request amount

200% of weekly dc amount requested

DataCap allocation requested

1000.0TiB

Total DataCap granted for client so far

454747.4YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

454747.4YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
18830 7 500TiB 23.89 126.87TiB
woshidama323 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

laurarenpanda commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

nj-steve commented 1 year ago

the retrieval rate increase so fine.

nj-steve commented 1 year ago

Request Proposed

Your Datacap Allocation Request has been proposed by the Notary

Message sent to Filecoin Network

bafy2bzacecoc6ktpvzamn24ymqxdtyo5igdyxizdy72alzkugvko5o4kaqh5u

Address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Datacap Allocated

1000.00TiB

Signer Address

f1xx6555qijma7igpnjspyvdunc4vfxkawnpqy5ii

Id

03f55812-bf25-4ee7-b7ba-36b69fe0d709

You can check the status of the message here: https://filfox.info/en/message/bafy2bzacecoc6ktpvzamn24ymqxdtyo5igdyxizdy72alzkugvko5o4kaqh5u

a1991car commented 1 year ago

Request Approved

Your Datacap Allocation Request has been approved by the Notary

Message sent to Filecoin Network

bafy2bzaceb7qbwl3tbnc47zrst4x2o5kusm7awq7wkbnjbt6zb7c3otjtk5ay

Address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Datacap Allocated

1000.00TiB

Signer Address

f1qnumecdypgrbaebtkdfjnwt5ndacadcuas3deiq

Id

03f55812-bf25-4ee7-b7ba-36b69fe0d709

You can check the status of the message here: https://filfox.info/en/message/bafy2bzaceb7qbwl3tbnc47zrst4x2o5kusm7awq7wkbnjbt6zb7c3otjtk5ay

Aaronn85 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval report.

herrehesse commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

github-actions[bot] commented 1 year ago

This application has not seen any responses in the last 10 days. This issue will be marked with Stale label and will be closed in 4 days. Comment if you want to keep this application open.

laurarenpanda commented 1 year ago

Please keep this application open.

large-datacap-requests[bot] commented 1 year ago

DataCap Allocation requested

Request number 4

Multisig Notary address

f02049625

Client address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

DataCap allocation requested

1.95PiB

Id

c2acc050-7a11-4ca7-bcd7-840e8c5a2608

large-datacap-requests[bot] commented 1 year ago

Stats & Info for DataCap Allocation

Multisig Notary address

f02049625

Client address

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Rule to calculate the allocation request amount

400% of weekly dc amount requested

DataCap allocation requested

1.95PiB

Total DataCap granted for client so far

909494701772928712704.0YiB

Datacap to be granted to reach the total amount requested by the client (5PiB)

909494701772928712704.0YiB

Stats

Number of deals Number of storage providers Previous DC Allocated Top provider Remaining DC
60296 13 1000.0TiB 9.51 8.69TiB
laurarenpanda commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

data-programs commented 1 year ago
KYC

This user’s identity has been verified through filplus.storage

woshidama323 commented 1 year ago

checker:manualTrigger

filplus-checker-app[bot] commented 1 year ago

DataCap and CID Checker Report Summary[^1]

Retrieval Statistics

Storage Provider Distribution

✔️ Storage provider distribution looks healthy.

Deal Data Replication

✔️ Data replication looks healthy.

Deal Data Shared with other Clients[^3]

✔️ No CID sharing has been observed.

[^1]: To manually trigger this report, add a comment with text checker:manualTrigger

[^2]: Deals from those addresses are combined into this report as they are specified with checker:manualTrigger

[^3]: To manually trigger this report with deals from other related addresses, add a comment with text checker:manualTrigger <other_address_1> <other_address_2> ...

Full report

Click here to view the CID Checker report. Click here to view the Retrieval Dashboard. Click here to view the Retrieval report.

woshidama323 commented 1 year ago

LGTM about the result of report and retrieval success rate is also good Will support