Open smuotoe opened 1 month ago
Thanks for the report. This is an interesting case. There are a few problems to solve before this can really work.
The first and main problem is that the "Run code" button only runs one code block from the LLM's message, never multiple. So in this case this is just running the pip install
command, and the Python code never actually runs. If you did generate a different LLM message that contained just the Python code, it would run, but it would do so within a different sandbox, which means all the PIP packages installed by the pip install
command would actually be absent.
The second problem is these .cache/pip
files. These files are created by pip
and it is legitimately how pip
caches the packages it downloads. It's not creating "excessive files" from its perspective, it's just that the sandbox limits are set such that they are because I wasn't expecting pip install
to run there.
Perhaps the fix for this second problem is to ignore files under .cache
in the output file detection, and don't count them against the per-execution file limit, since they wouldn't be copied out of the sandbox.
This doesn't solve the first problem though. Perhaps the function should be updated to sequentially run all code blocks of a message, rather than just one. The problem with this though is that some LLMs will generate other unrelated code blocks, such as example output or some example of how to run the script after having written it. It would be bad if those code blocks were to run. Let me know if you can think of a better way to select which blocks to run from a message.
(Then there's a third problem, which is that even if all the above worked well, there's no support for showing images yet (only text files), so the stock chart would still not show up. That one should be relatively easy to fix though.)
Thanks for your detailed response.
Your suggestion to ignore .cache
files is reasonable and should fix the "excess files" issue. Code execution is indeed a complex issue; a solution that comes to mind would be to have "Run code" buttons on each generated code block to allow the user to decide which code block to execute. In addition, every message within a thread should share a sandbox.
While the implementation may be non-trivial, I think these fixes are worth pursuing since most (Python) use cases would likely involve package installation.
Thanks for the report. This is an interesting case. There are a few problems to solve before this can really work.
The first and main problem is that the "Run code" button only runs one code block from the LLM's message, never multiple. So in this case this is just running the
pip install
command, and the Python code never actually runs. If you did generate a different LLM message that contained just the Python code, it would run, but it would do so within a different sandbox, which means all the PIP packages installed by thepip install
command would actually be absent.The second problem is these
.cache/pip
files. These files are created bypip
and it is legitimately howpip
caches the packages it downloads. It's not creating "excessive files" from its perspective, it's just that the sandbox limits are set such that they are because I wasn't expectingpip install
to run there.Perhaps the fix for this second problem is to ignore files under
.cache
in the output file detection, and don't count them against the per-execution file limit, since they wouldn't be copied out of the sandbox.This doesn't solve the first problem though. Perhaps the function should be updated to sequentially run all code blocks of a message, rather than just one. The problem with this though is that some LLMs will generate other unrelated code blocks, such as example output or some example of how to run the script after having written it. It would be bad if those code blocks were to run. Let me know if you can think of a better way to select which blocks to run from a message.
(Then there's a third problem, which is that even if all the above worked well, there's no support for showing images yet (only text files), so the stock chart would still not show up. That one should be relatively easy to fix though.)
Maybe an additional button run with pre-reqs and the user can provide the pip packages to be installed in the sandbox before running the code?
A few more thoughts on this issue.
Maybe an additional button run with pre-reqs and the user can provide the pip packages to be installed in the sandbox before running the code?
This solution doesn't seem great to me, because Open WebUI (at least currently) has no good way of displaying multiple buttons with different appearance. If there were two buttons, they would just show up as two identical-looking sparkle buttons under each message. The only way to tell the difference would be to hover the text of each button to find out what they do. Perhaps a future Open WebUI enhancement will solve this issue, but even so it doesn't seem like a great user experience.
a solution that comes to mind would be to have "Run code" buttons on each generated code block to allow the user to decide which code block to execute. In addition, every message within a thread should share a sandbox.
The other annoying Open WebUI limitation is that there is no way for the user to specify which code blocks within a single message the user wants to run; that's simply not a thing that Open WebUI transmits to functions. They only apply to whole messages.
I believe it should be possible to improve the user experience without adding UI complexity by improving the heuristics used to determine which code blocks to run. Specifically, it should be fairly easy to detect the case where there is one Bash (or shell-like) code block that contains the substring "pip install
", followed by one Python code block. I believe this is a common-enough case that adding special support for it seems worthwhile. In such case, the sandbox can run both code blocks in a sequence.
Another problem users may run into when doing this is the time it takes to download and install pip packages which may run against the default code execution duration limit of 30 seconds. Given the prevalence of this use-case, I also think that adding a customizable valve containing a comma-separated list of Python packages to pre-install would be useful. Such packages could be pre-installed in a per-user Python environment (initialized only once, not once per message), and would run under a separate sandbox with a much longer timeout (~30 minutes?) before any message-specific code runs.
My sense is that this solution would work for 80% of the "I want to run multiple code blocks" use-cases while not requiring any Open WebUI change and not adding UI complexity. As the remaining 20% of cases emerge, we could probably add similar heuristics to cover them.
An update on this issue: Two large commits (linked above) are now submitted. Neither of them fully solves this problem, but they are prerequisite towards solving it.
Description
When attempting to plot stock prices using the code execution tool, it only runs the bash script for package installation but fails to execute the subsequent Python code. Additionally, it creates an excessive number of files (over 1000) during this process.
General information
v0.3.23
0.6.0
Linux 5.15.153.1-microsoft-standard-WSL2 #1 SMP Fri Mar 29 23:14:13 UTC 2024 x86_64 x86_64 x86_64 GNU/Linux
Docker
Docker version 26.1.4, build 5650f9b
docker run
command:docker run --privileged=true --cgroupns=host -d -p 3000:8080 --add-host=host.docker.internal:host-gateway -v open-webui:/app/backend/data --name open-webui --restart always ghcr.io/open-webui/open-webui:main
Steps to Reproduce
Expected Behavior
The tool should:
Actual Behavior
The tool: