filecoin-project / devgrants

👟 Apply for a Filecoin devgrant. Help build the Filecoin ecosystem!
Other
371 stars 308 forks source link

Vector Database Integration for Filecoin's JavaScript SDK #1608

Closed TochKa21U closed 11 months ago

TochKa21U commented 1 year ago

Open Grant Proposal: Vector Database Integration for Filecoin's JavaScript SDK

Project Name: Vector Database Integration for Filecoin's JavaScript SDK

Proposal Category: Developer and data tooling, Integrations

Individual or Entity Name: OneBrain

Proposer: https://github.com/TochKa21U

(Optional) Filecoin ecosystem affiliations: None.

(Optional) Technical Sponsor: None.

Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes.

Project Summary

The project aims to create a direct integration between vector databases and Filecoin's JavaScript SDK. By enabling this integration, files stored in Filecoin can be converted into vector embeddings and directly inserted into vector databases.

At present, there exists a technological gap where Filecoin's storage system is separate from the data processing required for machine learning or neural network applications. This project aims to bridge that gap by allowing for seamless conversion and storage of files into vector embeddings.

The potential benefits from this integration are multifaceted:

  1. Data Accessibility: Conversion to vector embeddings simplifies the structure of complex data, aiding in improved data retrieval.
  2. Machine Learning and AI Applications: Readily available data in the form of vector embeddings can be directly used in machine learning models and neural network applications.
  3. Reduced Processing Time: The direct conversion and transfer of data into vector databases save computational costs and time.
  4. Versatile Use Cases: The integration can be useful for various applications including recommendation systems, similarity searches, AI chatbots, natural language processing, image recognition, voice search, and biometric security.

In essence, this project focuses on expanding Filecoin's storage capabilities by integrating vector databases into its JavaScript SDK, thereby enabling more efficient data processing for machine learning and neural network applications.

Impact

The Vector Database Integration for Filecoin's JavaScript SDK project presents a critical upgrade for users dealing with machine learning, neural networks, and vector similarity searches. By transforming stored files into vector embeddings and inserting these into the user's chosen vector database, it optimizes data utilization.

The process ensures a considerable reduction in computational load, which leads to savings in time and resources. The handling of data becomes more cost-effective and streamlined, which may encourage the execution of more complex data analysis operations.

The project holds the potential to improve the workflows of the machine learning and AI communities. By providing a more efficient tool, it could enhance the quality and speed of developments in areas like natural language processing, image recognition, voice search, and biometric security, among others.

In a broader context, the project could reshape the way data is processed and stored on the Filecoin network. The integration of vector databases in Filecoin's JavaScript SDK could herald an evolutionary step in AI and machine learning applications.

Outcomes

The project will be developed using JavaScript and will be released as a module. . Our team will also use this module to our SaaS service to integrate with Filecoin.

Once this project is complete, a Filecoin storage system and an automatic vector embeddings and saving them into integrated vector databases.

We aim to integrate multiple vector databases with flexibility of choosing different embedding models(Both Open Source and Platform Services such as OpenAI) by the end of 2023. The users of this module will be able to index their files and search through them via vector database. It will be especially useful in our current age where AI and especially LLMs and any other kind of generative ai solutions are on trend.

Adoption, Reach, and Growth Strategies

To ensure maximum adoption, reach, and growth, the project will follow several key strategies:

  1. Open Source and Licensing: Releasing this project under an Apache 2 license is a significant strategy. The permissive nature of the Apache 2 license encourages widespread adoption by providing the users with the freedom to use, modify, and distribute the software. It attracts contributors who can help improve and maintain the project.

  2. Package Distribution: The project will be published on npm and yarn, two of the largest software registry platforms. This makes the project accessible to a large number of developers and users, promoting adoption.

  3. Community Engagement: Actively engaging with the community is crucial. This can be achieved through consistent communication on relevant forums, providing comprehensive documentation, and responding to issues and pull requests in a timely manner.

  4. Tutorials and Guides: Creating detailed tutorials and guides can lower the barrier to entry, making it easier for potential users to understand and start using the project. These resources can be published on popular developer platforms like Medium, Dev.to, and GitHub.

  5. Integration with Existing Tools: Where possible, the project will aim to integrate with popular tools already being used in the JavaScript and Machine Learning ecosystems. This will make it easier for developers to adopt the project into their existing workflows.

These strategies aim to foster a supportive and vibrant ecosystem around the project, enabling it to grow and thrive. By consistently delivering value to the users and listening to their feedback, the project can continuously evolve to better serve the community.

Development Roadmap

The integration of the Filecoin currency into the platform will take a maximum of 4 months and will be split over three milestones:

Milestone 1
Duration: 4 weeks; complete 5 October 2023 (if approved 31 August 2023)
Budget: $5 000 USD
Number of FTE: two developers
Deliverables:

a. integrating the Filecoin Storage with One Vector Database

b. integrating the Vector Embedding

Milestone 2
Duration: 8 weeks; complete 16 December 2023
Budget: $25 000 USD
Number of FTE: two developers
Deliverables:

c. Integrating In Memory Vector DB for Development/Test Environment

d. Cross functional Vector DB

e. New Vector embedding

f. Fast initiation

Milestone 3
Duration: 2 weeks; complete 5 January 2024
Budget: $5 000 USD
Number of FTE: two developers
Deliverables:

g. final testing and changes.

Milestone X
Duration: 2 weeks; Repetitive Cycle Budget: $1 000 USD
Number of FTE: two developers
Deliverables:

x. Integrating new embedding model.

Milestone Y
Duration: 2 weeks; Repetitive Cycle Budget: $1 000 USD
Number of FTE: two developers
Deliverables:

y. Integrating new vector databases

Total Budget Requested

The total budget of $34 000 USD will be used to support our two software engineers + assistant through the integration process.

Based on request, we can continue on Milestone X and Y for the integrating new features in continuous cycle.

Maintenance and Upgrade Plans

Each milestone will be release of new feature such as new embedding model support and new vector database integration. We are planning to expand SDK to Python(There will be a new proposal) as well where has a base ground for many AI/ML developers and Data Scientists.

Team

Our team are group of like minded individuals who has interest in both blockchain and artificial intelligent. We are developing our own productivity application and we want to also contribute to open source project as a team as a return to all those available tools that provided by open source community to make many various application to happen.

Team Members

SERDAR A RICCARDO B BEDIRHAN H ETHAN C

Team Member LinkedIn Profiles

Serdar A: https://www.linkedin.com/in/arslaser/
Riccardo B: https://www.linkedin.com/in/riccardo-dal-pio-luogo-5a7b18192/
Bedirhan H: https://www.linkedin.com/in/bedirhan-hoskun-0b6682215/
Ethan C: https://www.linkedin.com/in/ethan-clime-93a42b89/

Team Website

https://onebrain.io/

Relevant Experience

Serdar A (Co-founder and CTO) is a software developer with various experience on the field. He is multidisiplinary engineer who worked as DevOps,Backend Developer, AI Developer, Front End, Level Designer(Unreal Engine 4.27,5.0).

Ricarrdo B (Advisor) <Riccardo's part> is a security engineer and penetration tester with experience in backend development and DevOps. He has a solid Blockchain experience in many different blockchains.

Bedirhan H (lead software engineer) is a skilled engineer with many years of experience in the field. Bedirhan is proficient in Python, NodeJS / TypeScript and Bash development. He has also expertise in DevOps and Server Administration.

Ethan C (Founder) is a talented product visionary guy with many different talents under his hands. He is the one of the first people to release ICO in Central European Region under his startup. Has created and contributed to many different startups.

Team code repositories

Serdar A: https://github.com/TochKa21U
Riccardo B: https://github.com/RiccardoBiosas
Bedirhan H: https://github.com/bhh37
Ethan C: Product Vision Other projects: https://onebrain.io/

Additional Information

We learned about this grant program through researching on the Filecoin website.

Please contact us at sa@onebrain.io for any kind of questions

ErinOCon commented 11 months ago

Hi @TochKa21U, thank you for your proposal and for your patience with our review. Unfortunately, we will not be proceeding with a grant at this time.

Wishing you all the best as you continue to build!