David-Kunz / gen.nvim

Neovim plugin to generate text using LLMs with customizable prompts
The Unlicense
1.11k stars 79 forks source link

Long context breaks the curl command #108

Open Chewt opened 1 month ago

Chewt commented 1 month ago

If a sufficiently long prompt is given, say for example a long enough file, then the curl command fails due to shell character limits. To be clear this isn't a context window error, its the shell running the curl command failing because too many characters are in the shell command. I've been researching ways to fix this on my own the past couple of days, but I thought I would open an issue to get some more eyes on it. I believe this is what was happening in #95 as well, but it looks like they closed their issue after no response was given for a few weeks.

Some options I have come across: Writing the prompt to a temp file then using the curl command with
curl ..... -d @/temp/file/path instead of
curl ..... -d $body
like it is in the default. I have tried this by replacing the default command option with my own custom command that implements this, however reading from a file seems to disable the --no-buffer option set in curl (or at least this is what I think is happening), so the response looks like this (run with :mesasges in nvim and the debug flag in gen.options set to true):

curl --no-buffer --silent -X POST http://localhost:11434/api/chat -d @/tmp/tempfile
Response data:
{ '{"model":"codegeex4","created_at":"2024-08-01T18:56:02.532312234Z","message":{"role":"assistant","content":"
It"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:56:02.532332457Z","message":{"role":"ass
istant","content":" seems"},"done":false}', "" }
Response data:
{ '{"model":"codegeex4","created_at":"2024-08-01T18:56:02.552571158Z","message":{"role":"assistant","content":"
 like"},"done":false}', "" }
Response data:
{ '{"model":"codegeex4","created_at":"2024-08-01T18:56:02.573474824Z","message":{"role":"assistant","content":"
 you"},"done":false}', "" }
Response data:
{ "{\"model\":\"codegeex4\",\"created_at\":\"2024-08-01T18:56:02.59440118Z\",\"message\":{\"role\":\"assistant\
",\"content\":\"'ve\"},\"done\":false}", "" }
Response data:
{ '{"model":"codegeex4","created_at":"2024-08-01T18:56:02.615319299Z","message":{"role":"assistant","content":"
 provided"},"done":false}', "" }
Response data:
{ '{"model":"codegeex4","created_at":"2024-08-01T18:56:02.636319313Z","message":{"role":"assistant","content":"
 a"},"done":false}', "" }
Response data:
{ '{"model":"codegeex4","created_at":"2024-08-01T18:56:02.657351916Z","message":{"role":"assistant","content":"
 snippet"},"done":false}', "" }
Response data:
{ '{"model":"codegeex4","created_at":"2024-08-01T18:56:02.678430316Z","message":{"role":"assistant","content":"
 of"},"done":false}', "" }
Response data:
{ '{"model":"codegeex4","created_at":"2024-08-01T18:56:02.699504204Z","message":{"role":"assistant","content":"
 text"},"done":false}', "" }
...
...

As you can see, each response is prefixed with Response data: rather than one long Response as seen when running :messages with the default config:

curl --silent --no-buffer -X POST http://hayden-desktop:11434/api/chat -d '{"messages": [{"role": "user", "cont
ent": "Regarding the following text, hi:\n"}], "model": "codegeex4", "stream": true}'
Response data:
{ '{"model":"codegeex4","created_at":"2024-08-01T18:46:52.81905225Z","message":{"role":"assistant","content":"I
"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:46:52.836396064Z","message":{"role":"assis
tant","content":" apologize"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:46:52.857321199
Z","message":{"role":"assistant","content":" for"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-
01T18:46:52.87830513Z","message":{"role":"assistant","content":" any"},"done":false}', '{"model":"codegeex4","c
reated_at":"2024-08-01T18:46:52.899270582Z","message":{"role":"assistant","content":" confusion"},"done":false}
', '{"model":"codegeex4","created_at":"2024-08-01T18:46:52.92027365Z","message":{"role":"assistant","content":"
 earlier"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:46:52.941308606Z","message":{"role
":"assistant","content":"."},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:46:52.962381635Z
","message":{"role":"assistant","content":" Let"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-0
1T18:46:52.983356092Z","message":{"role":"assistant","content":" me"},"done":false}', '{"model":"codegeex4","cr
eated_at":"2024-08-01T18:46:53.004456053Z","message":{"role":"assistant","content":" clarify"},"done":false}',
'{"model":"codegeex4","created_at":"2024-08-01T18:46:53.025473757Z","message":{"role":"assistant","content":" t
he"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:46:53.046593654Z","message":{"role":"ass
istant","content":" steps"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:46:53.067649195Z"
,"message":{"role":"assistant","content":" to"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T
18:46:53.088719773Z","message":{"role":"assistant","content":" create"},"done":false}', '{"model":"codegeex4","
created_at":"2024-08-01T18:46:53.109869961Z","message":{"role":"assistant","content":" a"},"done":false}', '{"m
odel":"codegeex4","created_at":"2024-08-01T18:46:53.13100698Z","message":{"role":"assistant","content":" custom
"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:46:53.152117724Z","message":{"role":"assis
tant","content":" font"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:46:53.173297407Z","m
essage":{"role":"assistant","content":" in"},"done":false}', '{"model":"codegeex4","created_at":"2024-08-01T18:
46:53.194268233Z","message":{"role":"assistant","content":" X"},"done":false}', '{"model":"codegeex4","created_
at":"2024-08-01T18:46:53.215292434Z","message":{"role":"assistant","content":"code"},"done":false}', '{"model":
"codegeex4","created_at":"2024-08-01T18:46:53.236418387Z","message":{"role":"assistant","content":" and"},....

Another way to potentially fix this is using an external plugin which implements curl in lua such as plenary. This is a common plugin used as a dependancy in many other nvim plugins, such as ollama.nvim and others. I don't know for sure if their implementation overcomes this issue with too many characters in the shell command, but I think it is worth testing.

Chewt commented 1 month ago

In case anyone else wants to test this, here is the custom command function I made:

local custom_command = function(options)
    local context = require('gen.init').context

    local body = vim.tbl_extend("force",
    {model = options.model, stream = true},
    options.body)
    local messages = {}
    if context then messages = context end
    -- Add new prompt to the context
    table.insert(messages, {role = "user", content = options.prompt})
    body.messages = messages
    if options.model_options ~= nil then -- override model options from gen command (if exist)
        body = vim.tbl_extend("force", body, options.model_options)
    end

    local json = options.json(body)
    json = json:match("^'(.*)'$") or json  -- remove surrounding quotes

    -- Write json to tmp file
    local fh, err = io.open("/tmp/tempfile", "w")
    if not fh then
        return nil, err
    end
    fh:write(json)
    fh:close()

    -- return curl command
    return "curl --no-buffer --silent -X POST http://" .. options.host .. ":" .. options.port .. "/api/chat -d @/tmp/tempfile"
end

just add command=custom_command to your require('gen').setup() config

David-Kunz commented 1 month ago

Thanks for opening this issue and doing the research, @Chewt !

Would it also be possible to increase the character limit for your shell commands?

Which os/shell do you use?

Thank you and best regards, David

Chewt commented 1 month ago

My OS is Linux (EndeavourOS specifically), and my shell is Zsh.

However I have discovered that my assumption that the shell was the cause has turned out to be false.

It turns out there is a hardcoded limit in linux for the maximum length of a single argument on the command line, which is hardcoded as 131072 bytes (this number comes from 32 * PAGESIZE in the Linux kernal, where the page size is 4kB)

Since the curl command is curl --no-buffer --silent -X POST http://localhost:11434/api/chat -d $body and $body is replaced by a string json object, the json object is treated as a single argument.

Depending on the model, this is a hard limit on context size to be within the bounds of a single string argument on the command line.