filecoin-project / filecoin-plus-large-datasets

Hub for client applications for DataCap at a large scale
110 stars 62 forks source link

[DataCap Application] FileDrive Labs - Smithsonian Open Access #1688

Closed laurarenpanda closed 7 months ago

laurarenpanda commented 1 year ago

Data Owner Name

FileDrive Labs

Data Owner Country/Region

China

Data Owner Industry

Life Science / Healthcare

Website

https://filedrive.io/

Social Media

Twitter: https://twitter.com/FileDrive1
Medium: https://medium.com/@FileDrive1
WeChat Offical Account: FileDrive

Total amount of DataCap being requested

5PiB

Weekly allocation of DataCap requested

500TiB

On-chain address for first allocation

f1udumyw3yjzxuu5co4rateaq6czubrwbyy2t4jiq

Custom multisig

Identifier

No response

Share a brief history of your project and organization

FileDrive Datasets Landing Plan is a project for onboarding more valuable public datasets onto the Filecoin network. Through several phases, we plan to bring 10 PiB data and promote 100 PiB storage power growth to Filecoin. 

About FileDrive Datasets

FileDrive Datasets is a platform to effectively connect the huge storage market that Filecoin has built with publishers of public datasets.
The Filecoin network provides reliable, secure, and affordable decentralized storage services, and FileDrive Labs wants to deliver these benefits to end-users by building a public dataset platform.
It is challenging to attract traditional Cloud Storage and Object-base Storage users to the Filecoin network and benefit from it. Developers in the Felicoin ecosystem, such as FileDrive Labs, need to face this challenge together.
As a member of the Filecoin ecosystem, FileDrive Labs has been insisting on developing useful tools to make it easier for users to store their data onto the Filecoin network. 

FileDrive Datasets has integrated a group of tools to provide storage service with the compatibility of both Cloud Storage and Object-base Storage and better user experience to attract more users.
Projects(ongoing) behind:
- Go-Graphsplit: https://github.com/filedrive-team/go-graphsplit
- DS-Cluster: https://github.com/filedrive-team/go-ds-cluster
- Filejoy: https://github.com/filedrive-team/filejoy

Article about FileDrive Datasets on Filecoin Blog:
- Large Datasets: FileDrive: https://filecoin.io/blog/posts/large-datasets-filedrive/

About FileDrive Labs

FileDrive Labs has always defined ourselves as tool developers and infrastructure builders in the Filecoin ecosystem. From 2019, we continuously focus on technical solutions and development based on IPFS protocol and the Filecoin network and do our best to contribute to the community.
Over 80% of our team are qualified engineers, and half of them have more than 10-year development experience in multiple industries, including Communication, the Internet, and blockchain.
Since 2020, we have participated in Slingshot Competition, become one of the top teams, and stored over 5 PiB useful data from public datasets to the Filecoin network.
To contribute to the Filecoin Community, we developed an open-source data prep tool Graphsplit, FIL+ project dashboard filplus.info and storage provider discovery platform filfind,info.
Besides, we have also hold weekly online virtual events named FileDrive Meetup from March 2022, which aims to provide a platform for community members to grasp the latest trends of the Filecoin network and our work and research.

Please check the following links for more details.
- GitHub: https://github.com/filedrive-team
- Twitter: https://twitter.com/FileDrive1
- Eventbrite: https://www.eventbrite.hk/o/filedrive-labs-42456337463
- YouTube Channel: https://www.youtube.com/channel/UCxcZC1dtBUlQvZY7DX13W1w
- Medium: https://medium.com/@FileDrive1

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders

No response

Describe the data being stored onto Filecoin

Smithsonian Open Access
- The Smithsonian’s mission is the "increase and diffusion of knowledge" and has been collecting since 1846. The Smithsonian, through its efforts to digitize its multidisciplinary collections, has created millions of digital assets and related metadata describing the collection objects. On February 25th, 2020, the Smithsonian released over 2.8 million CC0 interdisciplinary 2-D and 3-D images, related metadata, and additionally, research data from researches across the Smithsonian. The 2.8 million "open access" collections are a subset of the Smithsonian’s 155 million objects, 2.1 million library volumes and 156,000 cubic feet of archival collections held in 19 museums, 9 research centers, libraries, archives and the National Zoo. Digitization of collections is ongoing.
- https://registry.opendata.aws/smithsonian-open-access/
- License: CC0
- Size: 621.2 TiB

Where was the data currently stored in this dataset sourced from

My Own Storage Infra

If you answered "Other" in the previous question, enter the details here

No response

How do you plan to prepare the dataset

IPFS, lotus, graphsplit

If you answered "other/custom tool" in the previous question, enter the details here

No response

Please share a sample of the data

Original Source:
https://registry.opendata.aws/smithsonian-open-access/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

No response

What is the expected retrieval frequency for this data

Weekly

For how long do you plan to keep this dataset stored on Filecoin

2 to 3 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

How will you be distributing your data to storage providers

IPFS, Shipping hard drives, Lotus built-in data transfer

How do you plan to choose storage providers

Slack, Filmine

If you answered "Others" in the previous question, what is the tool or platform you plan to use

No response

If you already have a list of storage providers to work with, fill out their names and provider IDs below

Please check the Checker Reports of our previous LDN applications:
- https://github.com/filecoin-project/filecoin-plus-large-datasets/issues/1266

How do you plan to make deals to your storage providers

Lotus client

If you answered "Others/custom tool" in the previous question, enter the details here

No response

Can you confirm that you will follow the Fil+ guideline

Yes

laurarenpanda commented 7 months ago

Please keep this application open.