filecoin-project / devgrants

👟 Apply for a Filecoin devgrant. Help build the Filecoin ecosystem!
Other
371 stars 308 forks source link

# Open Grant Proposal: Onboarding and Indexing World’s Largest Open Datasets for Enhanced Filecoin Utility #1657

Closed herrehesse closed 10 months ago

herrehesse commented 11 months ago

Project Name: Onboarding and Indexing World’s Largest Open Datasets for Enhanced Filecoin Utility

Proposal Category: Applications

Individual or Entity Name: DCENT

Proposer: cryptowhizzard

(Optional) Filecoin ecosystem affiliations: DCENT

(Optional) Technical Sponsor: N/A

Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes

Project Summary

We aim to flawlessly store all 191 datasets listed on OpenPanda, ensuring data quality, indexing, and retrieval while showcasing Filecoin's capability to preserve humanity's essential information. This project serves as a beacon amid debates, showcasing best practices in data validity and retrievability.

Impact

Data authenticity, retrieval, and utility represent opportunities for enhancement within the Filecoin and the broader data storage ecosystem. By onboarding the world’s largest open datasets, we can amplify Filecoin's value, attracting more users and potentially boosting token demand. Embracing this initiative ensures we uphold trust, foster user engagement, and enhance network value. Success translates to a robust, reliable, and readily accessible dataset on the Filecoin network, serving diverse compute projects.

Outcomes

Adoption, Reach, and Growth Strategies

Target audience comprises developers, researchers, and organizations seeking reliable and accessible datasets. We are engaging with them through OpenPanda, Filecoin forums, and direct outreach. Our strategy involves tutorials, workshops, and demonstrations for initial user onboarding.

Datacap

One significant aspect of the process involves the application and utilisation of "datacap." Given the favorable disposition of storage providers towards sectors with datacap over regular deals, it's anticipated that the entirety of the data will be stored with datacap.

In the initial phase, we intend to navigate through the basic structure of the current LDN system to secure our datacap requirements. However, in a parallel effort, we will engage the Filecoin Plus community with a proposal to recognize us as a potential allocator of datacap, branching out from the traditional LDN approach.

Such an arrangement serves dual purposes:

  1. Experimentation & Evolution: Operating as a datacap allocator would equip us with a unique vantage point. This allows us to actively experiment with diverse models of datacap distribution, and in turn, contribute insights that aid the community in refining and redefining allocation strategies.

  2. Community Development: As active participants, we aim to offer recommendations and insights into various dimensions including retrieval standards, bot automations, and the broader framework that the community might adopt in future allocation mechanisms.

By embedding ourselves in this process, we aim not only to secure our data storage needs but also to actively shape and refine the Filecoin ecosystem's approach to datacap management.

Development Roadmap

  1. Milestone 1: Setup & High-Utility Onboarding

    • Set up infrastructure, gather all dataset information, and initial onboarding of the most frequently used datasets (1 - 46), ensuring their availability and utility from the get-go.
    • Team: 3 (1 developers, 1 project manager, 1 data specialist)
    • Technical: Computing power, bandwidth, storage, and networking capabilities to support data transfers and operations throughout the project duration.
    • Funding: $50,000
    • Duration: 2 months
  2. Milestone 2: Intermediate Dataset Integration

    • Concentrate on integrating medium-utilized datasets (47 - 92), further expanding the platform's diversity and range.
    • Team: 3 (1 developers, 1 project manager, 1 data specialist)
    • Technical: Computing power, bandwidth, storage, and networking capabilities to support data transfers and operations throughout the project duration.
    • Funding: $50,000
    • Duration: 2 months
  3. Milestone 3: Intermediate Dataset Integration

    • Concentrate on integrating low-utilized datasets (93 - 138), further expanding the platform's diversity and range.
    • Team: 3 (1 developers, 1 project manager, 1 data specialist)
    • Technical: Computing power, bandwidth, storage, and networking capabilities to support data transfers and operations throughout the project duration.
    • Funding: $50,000
    • Duration: 2 months
  4. Milestone 3: Final Dataset Integration

    • Concentrate on integrating remaining datasets (139 - 191), completing the full range of 191 available sets on OpenPanda.
    • Team: 3 (1 developers, 1 project manager, 1 data specialist)
    • Technical: Computing power, bandwidth, storage, and networking capabilities to support data transfers and operations throughout the project duration.
    • Funding: $50,000
    • Duration: 2 months

*We will later include a detailed list of datasets along with their corresponding milestones.

Total Budget Requested

Milestone # Description Deliverables Completion Date Funding
1 Setup & High-Utility Onboarding 25% Datasets Onboarded Q2 24 $50,000
2 Intermediate Dataset Onboarding 50% Datasets Onboarded Q3 24 $50,000
3 Intermediate Dataset Onboarding 75% Datasets Onboarded Q4 24 $50,000
4 Final Dataset Onboarding 100% Datasets Onboarded Q4 24 $50,000

Maintenance and Upgrade Plans

Post-project, we plan to continually monitor data integrity, ensure data remains indexed and retrievable, and work with the Filecoin community for improvements. Maintenance will be sustained via community contributions and potential future grants.

Team

Team Members

Team Member LinkedIn Profiles

Team Website

www.dcent.nl

Relevant Experience

Our data preparation business under the DCENT name has already onboarded over 100PiB in volume globally. Post the slingshot 2.6 program, we dived into genuine data onboarding and refined our techniques. We have developed tools to automate processes and maintain a track record of performance, positioning us uniquely for this task.

Team code repositories

GitHub.com/cryptowhizzard OpenPanda GitHub RePo

Additional Information

We learned about the Open Grants Program through our continued involvement in the Filecoin community and from the Filecoin Foundation's outreach

orvn commented 11 months ago

If this proposal is accepted, happy to support getting the onboarded data accessible on Open Panda (which I was a core contributor to).

herrehesse commented 11 months ago

Hello @orvn,

Big thanks for your supportive comment!

Over the past 8 weeks since the initiation of this proposal, we have been exploring multiple pathways of getting the proposal approved. We kicked things off by talking with the "Data Program" teams to get feedback and share our goals. Later, we chatted with Deep, Mara, Porter, Stefaan, and Clara to fine-tune our ideas into a solid plan, with Porter being a huge help in drafting our proposal.

A week ago at the Iceland DEV Summit, @protocolin, @momack2, and I had a great talk about the cool things that could happen if this proposal takes off. We’re excited about the benefits but know that figuring out funding, especially with how the market is now, is a big hurdle.

We want this project to be something everyone supports and benefits from as we work towards our goal: making super important information easy for anyone to access and use on the Filecoin network.

We're fully committed to working through the challenges and keeping the communication clear and constructive. Everyone's support, feedback, and willingness to work together mean a lot as we push forward, aiming to make real progress.

DSS-AL commented 11 months ago

This is a fantastic initiative developed by one of the most active SPs in the community, DSS are resources to support this in a meaningful way upon implementation.

xmcai2016 commented 11 months ago

I support this idea. It'd be great to have Open Panda showcase Filecoin's data retrievability end to end.

ErinOCon commented 10 months ago

HI @herrehesse, thank you for your patience with our review. Unfortunately, due to a shift in funding priorities in this current climate, we will not be moving forward with a grant at this time. If you have any questions for our team, please feel welcome to be in touch at grants@fil.org.

Wishing you the best with your building progress!

herrehesse commented 5 months ago

@ErinOCon @xmcai2016 @orvn I am trying to get this grant moving through: https://github.com/filecoin-project/community/discussions/695#discussioncomment-8953474