enricoros / big-AGI

Generative AI suite powered by state-of-the-art models and providing advanced AI/AGI functions. It features AI personas, AGI functions, multi-model chats, text-to-image, voice, response streaming, code highlighting and execution, PDF import, presets for developers, much more. Deploy on-prem or in the cloud.
https://big-agi.com
MIT License
5.49k stars 1.26k forks source link

Create Plugins-Option // RFC #50

Closed srysev closed 1 year ago

srysev commented 1 year ago

Give users a possibility to extend the chat functionality with their own plugins. For example I would like to create a plugin to upload a PDF and then ask questions to it. Just like in this OpenAI example: https://github.com/openai/openai-cookbook/tree/main/apps/file-q-and-a Or a plugin, which visits a web page and then answers questions about it's content. Like hier: https://github.com/openai/openai-cookbook/tree/main/apps/web-crawl-q-and-a

You wrote an amazing software, truly made with love, keep it up!

enricoros commented 1 year ago

Thanks for opening this. Very interesting app, but we will do better ;)

Just a few hours ago, we landed support for PDF reading, although without chunking and embeddings. This means there are a few pros and cons:

Quick analysis of the OpenAI app, just for study and reference:

  1. --> Load PDF ("attention is all you need")
  2. PDF to 71 text chunks, each to embeddings via process-file (Mean Embedding is also computed) image
  3. --> Type the question "What are the Key Takeaways"
  4. A file is sent to search-file-chunks and 10 results, ranked by dot product with the query, come back, each with their text and score (below input, and output) image image
  5. The 10 results are then sent to get-answer-from-files which uses 3.5-Turbo with the prompt below image

    const filesString = fileChunks
      .map((fileChunk) => `###\n\"${fileChunk.filename}\"\n${fileChunk.text}`)
      .join("\n")
      .slice(0, MAX_FILES_LENGTH);
    
    const prompt =
      `Given a question, try to answer it using the content of the file extracts below, and if you cannot answer, or find a relevant file, just output \"I couldn't find the answer to that question in your files.\".\n\n` +
      `If the answer is not contained in the files or if there are no file extracts, respond with \"I couldn't find the answer to that question in your files.\" If the question is not actually a question, respond with \"That's not a valid question.\"\n\n` +
      `In the cases where you can find the answer, first give the answer. Then explain how you found the answer from the source or sources, and use the exact filenames of the source files you mention. Do not make up the names of any other files other than those mentioned in the files context. Give the answer in markdown format.` +
      `Use the following format:\n\nQuestion: <question>\n\nFiles:\n<###\n\"filename 1\"\nfile text>\n<###\n\"filename 2\"\nfile text>...\n\nAnswer: <answer or "I couldn't find the answer to that question in your files" or "That's not a valid question.">\n\n` +
      `Question: ${question}\n\n` +
      `Files:\n${filesString}\n\n` +
      `Answer:`;
  6. Finally the answer: I couldn't find the answer to that question in your files. image

You wrote an amazing software, truly made with love, keep it up!

Thanks for pointing to this app. Great for study! Please try the new PDF importer a try (and hand-cut the PDFs) - but in pure rockstar style, please ask the hard questions, especially using the scientist or executive profiles - it's jawdropping ;)

srysev commented 1 year ago

thank you! :) Ok, the topic with the PDF was just an example of a Plugin. The aim of the RFC is to have a general possibility to plug in additional functionality to your app. A Plugin could have three parts: setup and two listeners: one before the request to OpenAI and one after(before the response is delivered to the user). It should have a possibility to manipulate the request and response.

The plugin could be activated either in the app UI or, maybe better, directly in the chat message text, for example with an annotation: "Check @web-access:www.acme.com/terms_conditions.html and find out what is the notice period of a contract cancellation. Find out how to reduce the period to a minimum." Or: "I'll be home in two hours, @my_buttler: turn the heating on and make me a coffee." where "my_buttler" could be the name of a smart home interface :)

enricoros commented 1 year ago

This is very interesting. In case or the "in two hours" example, is the model really going to wake up (or pause maybe) for 2 Horus, and then execute the action? I love it, but may require a server or some similar solution. I see the utility tho, very large.

enricoros commented 1 year ago

Have you tried taking exactly your text, and asking to design a typescript interface to enable those kinds of plugins? The code of this app is mostly generating itself :)

srysev commented 1 year ago

Have you tried taking exactly your text, and asking to design a typescript interface to enable those kinds of plugins? The code of this app is mostly generating itself :)

It is a good idea! :) Do you somehow provide the existing code as context and ask GPT-4 to extend it to a new functionality? or you provide a single prompt only and then build the generated code into the application manually?

enricoros commented 1 year ago

Do you somehow provide the existing code as context and ask GPT-4 to extend it to a new functionality? or you provide a single prompt only and then build the generated code into the application manually?

The first option. I drop in existing files and ask for features or fixes. Then copy the whole files back to the app.

Tried the pdf import? Game changer.

Also, very open to a plugin interface, but getting it right requires brainstorming 🧠. Want to keep top quality. Ideas specific on the plugin interface?

rossman22590 commented 1 year ago

How does the PDF feature work? does it learn the pdf via vectors or just paste it as a prompt?

enricoros commented 1 year ago

Just paste as prompt. But I was astonished at how insightful it is. as long as it fits the context window, vectors are no match for context. Vectors are "okay" (not good, as chunking loses semantics) for long content / retrieval. But if you want a summary of a pdf, and the pdf fits gpt4-8k (and soon gpr4-32k), then the context is 💯