[Roadmap] Integrate LLM output that contains code with third-party tools like repl.it

friederrr commented 2 weeks ago

Why

User have the option to offload code the LLM generates to a third party tool that can run the code (e.g. repl.it) and feed it's answer back as suggested input. This increases productivity a lot, compared to copy-pasting the code the LLM generates, seeing that it doesn't work, copying the error back to the LLM, and asking it again, etc.

Description

More specifically: 1) identify code blocks in an LLM output, 2) connect to some online instance that can run code (e.g., Google Colab's REST API or potentially via the Repl.it API) and serve those code blocks, and 3) return the output/error messages directly to the chat window, so I can easily add any custom messages I'd want to ask the LLM.

This way, one could prompt it for something like "Write me a Python function that computes the function x^23+2*x^5 for x =25". Suppose the code, when run on Repl.it generates and error, I'd have that error pasted directly in the chat window, to minimize clicks, so I can simply add a custom message to the LLM, or just, with a single click, forward the error to the LLM.

This would make code tool use much more pleasant.

enricoros commented 2 weeks ago

Thanks @friederrr , I couldn't find more info on the Google Colab API, any pointers?

friederrr commented 2 weeks ago

Turns out that only Google Docs actually has an API; and Replit seems to have retired their API: https://blog.replit.com/api-docs. Some much for the options I suggested (but I did manage to find an alternative, see below).

I searched for other COEs (cloud development environments) where code can be run via API access; oddly it seems that various COE providers that had API retired it, not sure why (e.g., https://docs.aws.amazon.com/cloud9/latest/APIReference/API_ListEnvironments.html).

Google Cloud API might have what is needed (potentially Google Cloud Functions), but it is a massive ecosystem, and I do not have experience with it (https://cloud.google.com/apis). AWS similarly probably has the capability to spin up an instance in most languages, run that code and report the output.

More friendly seems RunKit, https://www.apirefs.com/apps/runkit: "RunKit is a cloud-based development environment that allows developers to code, run, and share their projects directly from their browser. It provides a comprehensive suite of tools and features, including a built-in terminal, version control integration, and collaboration capabilities. RunKit eliminates the need for local setup and configuration, enabling developers to focus on their code without distractions. The platform supports multiple programming languages, including Python, JavaScript, Node.js, and Go, and offers a range of templates and examples to help users get started quickly. The RunKit API provides a programmatic interface to the platform's functionality. It includes extensive documentation that covers all aspects of the API, including authentication, resource management, and error handling. The API offers a RESTful interface with a comprehensive set of endpoints for creating, running, and managing projects, files, and executions."

Perhaps you could offer an integration with RunKit? As mentioned above, that would be awesome if we could enter our RunKit credentials (RunKit API key etc.) into the Big-AGI settings, then ask an LLM to generate some code and have a button somewhere to turn on send any LLM code to RunKit and receive the output from RunKit that ran that code block.

Behind the scenes that would mean identifying that code block within the LLM output (probably another LLM call needs to be made to identify the language/code block precisely), using RunKit's API to create an instance in that language, running the code, and sending back the RunKit output; perhaps the best way would be to have a second "code input field", below the first "text input field" (the "text input field" is the usual field, where you can write text that the LLM then processes and answer), that is auto-populated with the RunKit feedback. The idea is that after asking the LLM something and receiving its output, which will be in the code input field, you can add any hand-made remarks in the text input field while easily editing the RunKit feedback, if necessary (e.g. if it is a long error message you may want to delete parts of it etc.); when "send" is clicked, both input fields are concatenated and sent to the model. In particular, if the RunKit output is long it is helpful to have two input fields; it would help to avoid having to scroll around for a long time in a single input field to add some remarks at the beginning or end of the RunKit feedback.

enricoros / big-AGI

[Roadmap] Integrate LLM output that contains code with third-party tools like repl.it #627