filecoin-project / devgrants

đź‘ź Apply for a Filecoin devgrant. Help build the Filecoin ecosystem!
Other
376 stars 308 forks source link

Algovera: AI Workflows/Assistants for Web3 #1222

Closed richardblythman closed 1 year ago

richardblythman commented 1 year ago

Open Grant Proposal: Algovera: AI Workflows/Assistants for Web3

Name of Project: Algovera

Proposal Category: app-dev

Proposer: richardblythman

(Optional) Technical Sponsor: Dietrich Ayala

Do you agree to open source all work you do on behalf of this RFP and dual-license under MIT, APACHE2, or GPL licenses?: Yes

Project Description

AI holds promise for automating processes of organizations and communities. Traditionally, organizations needed to collect a dataset, build and train a model, deploy on cloud infrastructure and build a user interface. With large language models (LLMs) such as GPT-3, image models such as Stable Diffusion, and process models such as WebGPT, this process has become unbundled. A team can now develop workflows on top of a number of specialized model providers. Furthermore, the behaviour of general-purpose LLMs can be programmed on the fly through prompt engineering, reducing the need for fine-tuning on custom datasets.

Web3 projects typically operate in a more decentralized manner, which places more requirements on processes such as documentation, knowledge sharing and coordination. As a result, distributed teams can spend a lot of time on these tasks, working across a large number of tools such as Discord, Discourse, Notion etc. We believe that incorporating AI can help to increase the efficiency of contributors, improve the user experience and ultimately create more cohesive organizations. Nonetheless, few Web3 projects have integrated AI into their applications, processes and organizations, in our experience.

The goal of this proposal is to create a platform for building, deploying and monetizing AI workflows and assistants that interact with Web3 services such as IPFS, and other apps such as Discord, Discourse and Notion. You can think of this like Zapier built specifically to the needs of AI-powered automation (and with better integration for Web3 services). An example of an AI workflow could be to take a Discourse post, generate a summary and an image, create a new Snapshot proposal with summary, and post the summary and image to Twitter. Our goal is to progressively decentralize the infrastructure of the platform over time using technologies such as IPFS, bacalhau and Ocean Protocol.

We are focused on bringing in users and revenue. Our GTM is to build a few high-value workflows ourselves and sell them to online communities and organizations. We will have a simple SaaS-like monthly-subscription model. Over time, we will onboard the workflows created by 45+ teams in our community that have been funded by Algovera grants. Rewards will be directed to developers whenever their workflow or components bring in revenue.

image

Value

What are the benefits of getting this right?

We believe that our workflow platform can open up a tonne of opportunities for collaborations and integrations with other Web3 apps, including ones in the PL ecosystem. For example, consider AI workflows such as Writing Assistants + Skiff Pages/Skiff Mail. We are already discussing integration with bacalhau’s DAG/workflow orchestrator for data processing pipelines. The best case scenario is that our open source tools kickstart a wave of innovation on AI workflows in the Web3 space. We also have a chance to integrate Web3 with popular AI tools, which attracts AI developers to the Web3 space.

What are the risks if you don't get it right?

In our experience, AI developers are generally quite skeptical of crypto and Web3. Failing to get the UX of the platform right could further alienate this community. For example, how should we implement payments and rewards?

What are the risks that will make executing this project difficult?

Bootstrapping the supply side of a marketplace is known to be a tough problem. We hope to mitigate this by building a few high-value workflows ourselves to serve online communities and organizations. After that, we will onboard the workflows created by 45+ teams in our community that have been funded by Algovera grants.

As always, there is a risk that the PR to the open source library LangChain will not be accepted. For example, they may not be interested in integration with Web3 apps. We have worked to mitigate this by opening communication with owners early (they are very responsive). If it is still not accepted, we plan to fork and maintain the library ourselves.

Deliverables

Development Roadmap

Milestone 1 - Integrate IPFS services with a popular existing LLM library called LangChain In this milestone, we will integrate IPFS services (local node, web3.storage, estuary) with a popular Python library called LangChain and submit a PR. We will add functionality for reading from and writing to IPFS, which can be used for inputting data to LLM workflows, and saving input prompts, intermediate features or outputs using IPFS. We will also integrate services from other popular Web3 Services (e.g. Ocean Protocol, Snapshot) and apps that are popular in Web3 (e.g. Discourse, Twitter).

Estimated Time - 2.25 full time person months ($15,000) Dates: 12/1/2022 - 12/31/2022

Milestone 2 - Backend server with deployment of an LLM workflow that uses IPFS The aim of this milestone is to deploy a proof of concept of an LLM workflow in a backend server. This LLM workflow will read some data from discourse, perform text summarization using GPT-3, create an image using Stable Diffusion, and write the results to IPFS.

Estimated Time - 3 full time person months ($20,000) Dates: 12/1/2022 - 12/31/2022

Milestone 3 - Frontend for exploring LLM workflows The aim of this milestone is to develop the frontend of an app to facilitate the building, purchasing and running LLM workflows relevant to Web3 organizations.

Estimated Time - 3 full time person months ($20,000) Dates: Dates: 1/1/2023 - 1/31/2023

Milestone 4 - Backend and client library for onboarding LLM workflows In this milestone, we will create a solution for storing and versioning LLM workflows and add functionality for user upload of LLM workflows. This will involve building a client library for uploading workflows.

We will also onboard all of the suitable AI workflows that have been developed by 45+ teams in our community (funded by Algovera grants). Through our integrations with LangChain we will also onboard a number of LLM workflows created by individuals using LangChain. We will write scripts to deploy their workflows and share the majority of revenue generated by their workflows (minus a transaction fee).

Estimated Time - 3 full time person months ($20,000) Dates: Dates: 2/1/2023 - 2/28/2023

Total Budget Requested

$75,000

Maintenance and Upgrade Plans

We plan to more deeply integrate the platform with decentralized infrastructure such as IPFS, bacalhau, Ocean Protocol and more over time.

Team

Team Members

Team Member LinkedIn Profiles

Team Website

www.algovera.ai

Relevant Experience

Dr. Richard Blythman is a machine learning R&D engineer with 5 years of experience in university, industry and startups. He is the founder of Algovera, a Web3 project and community advancing the development of the decentralised AI stack. Algovera has completed 9 successful grants with Ocean Protocol.

Hithesh Shaji is a fullstack engineer with 4 years of experience in startups. He is a co-founder at Algovera and lead the product team. He is currently on sabbatical from his Computer Science MSc from University of Bath. He has 1 year of blockchain development experience and graduated from Consensys Blockchain Developer bootcamp.

Mohamed Arshath is a machine learning engineer. He has experience building numerous machine-learning models. Apart from building models, he has experience building backend APIs and managing the backend on the cloud using docker and Kubernetes.

Jakub Smekal is an undergraduate student in maths and physics, and core team member of Algovera. He is a receiver of an Opscientia fellowship. He has worked on Python libraries for simulating complex systems and integrating Jupyter with MetaMask. He has experience working as a machine learning engineer in computer vision.

Casey Clifton is a fullstack software developer and AI researcher. He has experience building a scaling numerous applications from AI medical imaging tools to a blockchain-powered rendering engine. He has also published research papers and completed industry and government R&D grants in these areas.

Team code repositories

Algovera has previously worked on several IPFS integrations:

We’ve also worked with a variety of compute protocols and workflow orchestrators:

Additional Information

We are already discussing integration with bacalhau’s DAG/workflow orchestrator for data processing pipelines.

autonome commented 1 year ago

What's the risk that LangChain does not accept your PRs to add IPFS support?

What are some examples of the high value workflows you plan to sell?

richardblythman commented 1 year ago

Hey @autonome. Thanks for your questions!

On your first question, in the original proposal I mentioned "As always, there is a risk that the PR to the open source library LangChain will not be accepted. For example, they may not be interested in integration with Web3 apps. We have worked to mitigate this by opening communication with owners early (they are very responsive). If it is still not accepted, we plan to fork and maintain the library ourselves." I've since had a couple of meetings with the owner, and some projects in his community have applied for microgrants from us. They have also recently integrated traditional databases like SQLite. I haven't asked about whether he would be open to integrating IPFS, but I can do that if that helps. In the meantime, we've also started building our own LLM library that is geared more towards deploying LLM workflows than prototyping. We will support IPFS here for sure, although I still think it would be cool to submit a PR to LangChain. We are also in talks with other Web3 projects like Ocean, Ceramic about integrating their data infrastructure with LLM workflows.

On your second question, I actually just created some docs for common use cases today that we think are high value, such as community management and personal note management (with more coming soon). You can read about it here. These all use LLM workflows with two steps. The first step uses semantic search to retrieve relevant pages of the docs, which are then passed with the question to the LLM (technical overview). It would be super cool to be able to store and retrieve these docs/notes from IPFS. We're also working on GovernGPT with BanklessDAO to assist in writing governance proposals (should be available by the end of the month). Let me know if you have any other ideas or suggestions.

We've actually completed a lot of Milestone 3 and 4 since we wrote this proposal. If you would prefer, we can update these milestones to reflect future work that provides value to the IPFS and Filecoin ecosystems.

richardblythman commented 1 year ago

@autonome Harrison (creator of LangChain) is interested in adding IPFS support image

richardblythman commented 1 year ago

As I mentioned previously, we've completed a lot of Milestone 3 and 4 since submitting this proposal. I have suggested some alternative milestones below.

We believe that storing knowledge for organizations and communities is a great use case for IPFS. With recent advances in AI, the way that people interact with all knowledge is set to change. The milestones below will build on milestones 1 and 2 to create a PoC to demonstrate how communities can scrape, process and store community data on IPFS for retrieval by personalized LLM apps like ChatGPT. In future, we plan to move towards deploying "on prem" LLMs for communities/orgs, rather than closed APIs.


Milestone 3 - Data engineering work on IPFS data from Discord, Slack, docs for driving ChatGPT-like assistants for the IPFS community

This stage involves data engineering of the various IPFS data sources for the different workflows in Milestone 4. For question answering (q/a) tasks, a naive approach is to pass all of the Discord/Slack data to GPT-3. We’ll try this first although we’ve heard from other projects that it doesn’t work too well (with so much noise in the data). A better approach is to scrape and build a dataset of question and answer pairs from the IPFS Discord (e.g. related to community management and dev support). We’ll build and automate a workflow that uses GPT-3 itself to classify questions and answers from the raw data. For summarization tasks, we’ll build a dataset of input/output pairs for summarizing e.g. by the day, or by channel. The data will then be passed through the OpenAI embedding model (to facilitate semantic search), and stored in a vector database on IPFS. This work will likely require hundreds of dollars of OpenAI credits.

Estimated Time - 3 full time person months ($20,000)

Milestone 4 - Prototyping and deployment of 3 ChatGPT-like apps for the IPFS community that perform (i) question answering (q/a) and (ii) summarization tasks based on community data sources such as IPFS docs, Discord and Slack

We’ll build and deploy 3 prototype LLM workflows that we believe will provide value to the IPFS community:

  1. An LLM workflow for a q/a chat assistant for the IPFS docs with a user interface similar to https://chat.langchain.dev/
  2. An LLM workflow for a q/a chat assistant built on q/a pairs from IPFS Discord/Slack data (e.g. community management and dev support)
  3. An LLM workflow for daily summarization of activity on the IPFS Discord/Slack These will be built using LLM workflows that perform (i) semantic search over embedded docs, retrieve closest N, (ii) refinement to fit in input prompt, (iii) prompt engineering and pass to LLM. These will be hosted within the Algovera Flow platform and at standalone urls (if desired).

Estimated Time - 3 full time person months ($20,000)

ErinOCon commented 1 year ago

Hi @richardblythman, thank you again for your ongoing patience with our review. It's great to see projects that are taking IPFS and related tech outside of our ecosystem! With this in mind, we would like to move forward with the consideration of supporting the first two milestones of this project.

Considering this update and how much time has passed, are there any changes you would like to make to your proposal?

ErinOCon commented 1 year ago

Hi @richardblythman, I hope you are doing well! Would you like to expand further or provide progress updates on the first two milestones?

richardblythman commented 1 year ago

Hi @ErinOCon, apologies for the slow response. I've been traveling with conferences for the last two weeks.

That's great news! Thank you for your support. I've since had a bunch of new ideas for integrating IPFS with LLMs. Maybe it would be useful to incorporate these through some small updates to the first 2 milestones. What do you think?

I could share the updates with you next week?

richardblythman commented 1 year ago

Hi @ErinOCon, I'm waiting to get confirmation from you on making some minor updates. Let me know and I can have it done within an hour or two.

Also happy to go ahead with the milestones as currently defined.

ErinOCon commented 1 year ago

Hi @richardblythman, please feel welcome to provide updates to the first two milestones. If the changes are significant enough, you can also submit a new proposal. Should you choose to go the new proposal route, please make a note in the comments here so that the submission doesn't land in the back of our review queue.

richardblythman commented 1 year ago

Hi @ErinOCon, Here are the minor changes that we would like to make to Milestone 2. The main difference is that we will implement multiple workflows (rather than one). Does this look OK to you?


Milestone 2 - Backend server with deployment of multiple LLM workflows that use IPFS

The aim of this milestone is to deploy multiple proof of concept LLM workflows in a backend server.

  1. An LLM workflow for using ChatGPT with IPFS data
  2. An LLM workflow for a question answering from text, code and tables across multiple data sources on IPFS
  3. An LLM workflow for summarization of data on IPFS

Estimated Time - 3 full time person months ($20,000)

richardblythman commented 1 year ago

Hey @ErinOCon. Any updates on this? How do the minor changes look?

richardblythman commented 1 year ago

Hi @ErinOCon, Just following up on this again. Any updates?

ErinOCon commented 1 year ago

Hi @richardblythman, thank you again for all of your patience. We would like to move your outlined work for milestone 1 and 2 to the next steps in our process! We will send an email with further details.

richardblythman commented 1 year ago

Hey @ErinOCon, That's great news. Looking forward to hearing from you in the email.