Open Grant Proposal: Vector Database Integration for Filecoin's JavaScript SDK
Project Name:Vector Database Integration for Filecoin's JavaScript SDK
Proposal Category:Developer and data tooling, Integrations
Individual or Entity Name:OneBrain
Proposer:https://github.com/TochKa21U
(Optional) Filecoin ecosystem affiliations:None.
(Optional) Technical Sponsor:None.
Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes.
Project Summary
The project aims to create a direct integration between vector databases and Filecoin's JavaScript SDK. By enabling this integration, files stored in Filecoin can be converted into vector embeddings and directly inserted into vector databases.
At present, there exists a technological gap where Filecoin's storage system is separate from the data processing required for machine learning or neural network applications. This project aims to bridge that gap by allowing for seamless conversion and storage of files into vector embeddings.
The potential benefits from this integration are multifaceted:
Data Accessibility: Conversion to vector embeddings simplifies the structure of complex data, aiding in improved data retrieval.
Machine Learning and AI Applications: Readily available data in the form of vector embeddings can be directly used in machine learning models and neural network applications.
Reduced Processing Time: The direct conversion and transfer of data into vector databases save computational costs and time.
Versatile Use Cases: The integration can be useful for various applications including recommendation systems, similarity searches, AI chatbots, natural language processing, image recognition, voice search, and biometric security.
In essence, this project focuses on expanding Filecoin's storage capabilities by integrating vector databases into its JavaScript SDK, thereby enabling more efficient data processing for machine learning and neural network applications.
Impact
The Vector Database Integration for Filecoin's JavaScript SDK project presents a critical upgrade for users dealing with machine learning, neural networks, and vector similarity searches. By transforming stored files into vector embeddings and inserting these into the user's chosen vector database, it optimizes data utilization.
The process ensures a considerable reduction in computational load, which leads to savings in time and resources. The handling of data becomes more cost-effective and streamlined, which may encourage the execution of more complex data analysis operations.
The project holds the potential to improve the workflows of the machine learning and AI communities. By providing a more efficient tool, it could enhance the quality and speed of developments in areas like natural language processing, image recognition, voice search, and biometric security, among others.
In a broader context, the project could reshape the way data is processed and stored on the Filecoin network. The integration of vector databases in Filecoin's JavaScript SDK could herald an evolutionary step in AI and machine learning applications.
Outcomes
The project will be developed using JavaScript and will be released as a module. . Our team will also use this module to our SaaS service to integrate with Filecoin.
Once this project is complete, a Filecoin storage system and an automatic vector embeddings and saving them into integrated vector databases.
We aim to integrate multiple vector databases with flexibility of choosing different embedding models(Both Open Source and Platform Services such as OpenAI) by the end of 2023. The users of this module will be able to index their files and search through them via vector database. It will be especially useful in our current age where AI and especially LLMs and any other kind of generative ai solutions are on trend.
Adoption, Reach, and Growth Strategies
To ensure maximum adoption, reach, and growth, the project will follow several key strategies:
Open Source and Licensing: Releasing this project under an Apache 2 license is a significant strategy. The permissive nature of the Apache 2 license encourages widespread adoption by providing the users with the freedom to use, modify, and distribute the software. It attracts contributors who can help improve and maintain the project.
Package Distribution: The project will be published on npm and yarn, two of the largest software registry platforms. This makes the project accessible to a large number of developers and users, promoting adoption.
Community Engagement: Actively engaging with the community is crucial. This can be achieved through consistent communication on relevant forums, providing comprehensive documentation, and responding to issues and pull requests in a timely manner.
Tutorials and Guides: Creating detailed tutorials and guides can lower the barrier to entry, making it easier for potential users to understand and start using the project. These resources can be published on popular developer platforms like Medium, Dev.to, and GitHub.
Integration with Existing Tools: Where possible, the project will aim to integrate with popular tools already being used in the JavaScript and Machine Learning ecosystems. This will make it easier for developers to adopt the project into their existing workflows.
These strategies aim to foster a supportive and vibrant ecosystem around the project, enabling it to grow and thrive. By consistently delivering value to the users and listening to their feedback, the project can continuously evolve to better serve the community.
Development Roadmap
The integration of the Filecoin currency into the platform will take a maximum of 4 months and will be split over three milestones:
Milestone 1
Duration: 4 weeks; complete 5 October 2023 (if approved 31 August 2023)
Budget: $5 000 USD
Number of FTE: two developers
Deliverables:
a. integrating the Filecoin Storage with One Vector Database
Functionality: once complete, developers would be able to connect their storage with the vector database to start indexing
Measurement: once complete, Module can be sent to you or it can be demonstrated
b. integrating the Vector Embedding
Functionality: once complete, developers will be able to directly index their files with the provided/supported vector embedding
Measurement: once complete, we will update the Module and it can be demonstrated
Milestone 2
Duration: 8 weeks; complete 16 December 2023
Budget: $25 000 USD
Number of FTE: two developers
Deliverables:
c. Integrating In Memory Vector DB for Development/Test Environment
Functionality: once complete, developers would not need to setup or prepare vector db for themselves but can directly use In Memory DB to start developing
Measurement: once complete, we will provide you with a code and demonstration
d. Cross functional Vector DB
Functionality: once complete, In Memory database and Stored Database would be able to adjustable via option to switch from development to deployment mode
Measurement: once complete, we will provide you with a code and demonstration
e. New Vector embedding
Functionality: once complete, users would be able to choose from different vector embedding
Measurement: once complete, we will provide you with a code
f. Fast initiation
Functionality: once complete, developers would be able to either customise or use default settings for the initiation integration between Filestorage and Vector DB
Measurement: once complete, we will provide you with a code
Milestone 3
Duration: 2 weeks; complete 5 January 2024
Budget: $5 000 USD
Number of FTE: two developers
Deliverables:
g. final testing and changes.
Functionality: once complete, all functions will have been integrated and the project will be complete.
Measurement: once complete, we will provide you with a code and demonstration, after approval Module will be released on NPM/Yarn
Milestone X
Duration: 2 weeks; Repetitive Cycle
Budget: $1 000 USD
Number of FTE: two developers
Deliverables:
x. Integrating new embedding model.
Functionality: once complete, developers would be able to use new embeddings models.
Measurement: once complete, we will provide you with a code
Milestone Y
Duration: 2 weeks; Repetitive Cycle
Budget: $1 000 USD
Number of FTE: two developers
Deliverables:
y. Integrating new vector databases
Functionality: once complete, developers would be able to use new vector database.
Measurement: once complete, we will provide you with a code
Total Budget Requested
The total budget of $34 000 USD will be used to support our two software engineers + assistant through the integration process.
Based on request, we can continue on Milestone X and Y for the integrating new features in continuous cycle.
Maintenance and Upgrade Plans
Each milestone will be release of new feature such as new embedding model support and new vector database integration. We are planning to expand SDK to Python(There will be a new proposal) as well where has a base ground for many AI/ML developers and Data Scientists.
Team
Our team are group of like minded individuals who has interest in both blockchain and artificial intelligent. We are developing our own productivity application and we want to also contribute to open source project as a team as a return to all those available tools that provided by open source community to make many various application to happen.
Serdar A (Co-founder and CTO) is a software developer with various experience on the field. He is multidisiplinary engineer who worked as DevOps,Backend Developer, AI Developer, Front End, Level Designer(Unreal Engine 4.27,5.0).
Ricarrdo B (Advisor) <Riccardo's part> is a security engineer and penetration tester with experience in backend development and DevOps. He has a solid Blockchain experience in many different blockchains.
Bedirhan H (lead software engineer) is a skilled engineer with many years of experience in the field. Bedirhan is proficient in Python, NodeJS / TypeScript and Bash development. He has also expertise in DevOps and Server Administration.
Ethan C (Founder) is a talented product visionary guy with many different talents under his hands. He is the one of the first people to release ICO in Central European Region under his startup. Has created and contributed to many different startups.
Open Grant Proposal: Vector Database Integration for Filecoin's JavaScript SDK
Project Name:
Vector Database Integration for Filecoin's JavaScript SDK
Proposal Category:
Developer and data tooling, Integrations
Individual or Entity Name:
OneBrain
Proposer:
https://github.com/TochKa21U
(Optional) Filecoin ecosystem affiliations:
None.
(Optional) Technical Sponsor:
None.
Do you agree to open source all work you do on behalf of this RFP under the MIT/Apache-2 dual-license?: Yes.
Project Summary
The project aims to create a direct integration between vector databases and Filecoin's JavaScript SDK. By enabling this integration, files stored in Filecoin can be converted into vector embeddings and directly inserted into vector databases.
At present, there exists a technological gap where Filecoin's storage system is separate from the data processing required for machine learning or neural network applications. This project aims to bridge that gap by allowing for seamless conversion and storage of files into vector embeddings.
The potential benefits from this integration are multifaceted:
In essence, this project focuses on expanding Filecoin's storage capabilities by integrating vector databases into its JavaScript SDK, thereby enabling more efficient data processing for machine learning and neural network applications.
Impact
The Vector Database Integration for Filecoin's JavaScript SDK project presents a critical upgrade for users dealing with machine learning, neural networks, and vector similarity searches. By transforming stored files into vector embeddings and inserting these into the user's chosen vector database, it optimizes data utilization.
The process ensures a considerable reduction in computational load, which leads to savings in time and resources. The handling of data becomes more cost-effective and streamlined, which may encourage the execution of more complex data analysis operations.
The project holds the potential to improve the workflows of the machine learning and AI communities. By providing a more efficient tool, it could enhance the quality and speed of developments in areas like natural language processing, image recognition, voice search, and biometric security, among others.
In a broader context, the project could reshape the way data is processed and stored on the Filecoin network. The integration of vector databases in Filecoin's JavaScript SDK could herald an evolutionary step in AI and machine learning applications.
Outcomes
The project will be developed using JavaScript and will be released as a module. . Our team will also use this module to our SaaS service to integrate with Filecoin.
Once this project is complete, a Filecoin storage system and an automatic vector embeddings and saving them into integrated vector databases.
We aim to integrate multiple vector databases with flexibility of choosing different embedding models(Both Open Source and Platform Services such as OpenAI) by the end of 2023. The users of this module will be able to index their files and search through them via vector database. It will be especially useful in our current age where AI and especially LLMs and any other kind of generative ai solutions are on trend.
Adoption, Reach, and Growth Strategies
To ensure maximum adoption, reach, and growth, the project will follow several key strategies:
Open Source and Licensing: Releasing this project under an Apache 2 license is a significant strategy. The permissive nature of the Apache 2 license encourages widespread adoption by providing the users with the freedom to use, modify, and distribute the software. It attracts contributors who can help improve and maintain the project.
Package Distribution: The project will be published on npm and yarn, two of the largest software registry platforms. This makes the project accessible to a large number of developers and users, promoting adoption.
Community Engagement: Actively engaging with the community is crucial. This can be achieved through consistent communication on relevant forums, providing comprehensive documentation, and responding to issues and pull requests in a timely manner.
Tutorials and Guides: Creating detailed tutorials and guides can lower the barrier to entry, making it easier for potential users to understand and start using the project. These resources can be published on popular developer platforms like Medium, Dev.to, and GitHub.
Integration with Existing Tools: Where possible, the project will aim to integrate with popular tools already being used in the JavaScript and Machine Learning ecosystems. This will make it easier for developers to adopt the project into their existing workflows.
These strategies aim to foster a supportive and vibrant ecosystem around the project, enabling it to grow and thrive. By consistently delivering value to the users and listening to their feedback, the project can continuously evolve to better serve the community.
Development Roadmap
The integration of the Filecoin currency into the platform will take a maximum of 4 months and will be split over three milestones:
Milestone 1
Duration: 4 weeks; complete 5 October 2023 (if approved 31 August 2023)
Budget: $5 000 USD
Number of FTE: two developers
Deliverables:
a. integrating the Filecoin Storage with One Vector Database
b. integrating the Vector Embedding
Milestone 2
Duration: 8 weeks; complete 16 December 2023
Budget: $25 000 USD
Number of FTE: two developers
Deliverables:
c. Integrating In Memory Vector DB for Development/Test Environment
d. Cross functional Vector DB
e. New Vector embedding
f. Fast initiation
Milestone 3
Duration: 2 weeks; complete 5 January 2024
Budget: $5 000 USD
Number of FTE: two developers
Deliverables:
g. final testing and changes.
Milestone X
Duration: 2 weeks; Repetitive Cycle Budget: $1 000 USD
Number of FTE: two developers
Deliverables:
x. Integrating new embedding model.
Milestone Y
Duration: 2 weeks; Repetitive Cycle Budget: $1 000 USD
Number of FTE: two developers
Deliverables:
y. Integrating new vector databases
Total Budget Requested
The total budget of $34 000 USD will be used to support our two software engineers + assistant through the integration process.
Based on request, we can continue on Milestone X and Y for the integrating new features in continuous cycle.
Maintenance and Upgrade Plans
Each milestone will be release of new feature such as new embedding model support and new vector database integration. We are planning to expand SDK to Python(There will be a new proposal) as well where has a base ground for many AI/ML developers and Data Scientists.
Team
Our team are group of like minded individuals who has interest in both blockchain and artificial intelligent. We are developing our own productivity application and we want to also contribute to open source project as a team as a return to all those available tools that provided by open source community to make many various application to happen.
Team Members
SERDAR A RICCARDO B BEDIRHAN H ETHAN C
Team Member LinkedIn Profiles
Serdar A: https://www.linkedin.com/in/arslaser/
Riccardo B: https://www.linkedin.com/in/riccardo-dal-pio-luogo-5a7b18192/
Bedirhan H: https://www.linkedin.com/in/bedirhan-hoskun-0b6682215/
Ethan C: https://www.linkedin.com/in/ethan-clime-93a42b89/
Team Website
https://onebrain.io/
Relevant Experience
Serdar A (Co-founder and CTO) is a software developer with various experience on the field. He is multidisiplinary engineer who worked as DevOps,Backend Developer, AI Developer, Front End, Level Designer(Unreal Engine 4.27,5.0).
Ricarrdo B (Advisor) <Riccardo's part> is a security engineer and penetration tester with experience in backend development and DevOps. He has a solid Blockchain experience in many different blockchains.
Bedirhan H (lead software engineer) is a skilled engineer with many years of experience in the field. Bedirhan is proficient in Python, NodeJS / TypeScript and Bash development. He has also expertise in DevOps and Server Administration.
Ethan C (Founder) is a talented product visionary guy with many different talents under his hands. He is the one of the first people to release ICO in Central European Region under his startup. Has created and contributed to many different startups.
Team code repositories
Serdar A: https://github.com/TochKa21U
Riccardo B: https://github.com/RiccardoBiosas
Bedirhan H: https://github.com/bhh37
Ethan C: Product Vision Other projects: https://onebrain.io/
Additional Information
We learned about this grant program through researching on the Filecoin website.
Please contact us at sa@onebrain.io for any kind of questions