ahyatt / llm

A package abstracting llm capabilities for emacs.
GNU General Public License v3.0
178 stars 24 forks source link

fix: OpenAI API keys passed as multibyte strings #44

Closed hraban closed 5 months ago

hraban commented 5 months ago

Emacs has two types of strings: multibyte and unibyte. The request library is essentially a giant ‘concat’ call, which converts the entire result to multibyte if any single component is multibyte, including the headers. Even if you encoded the body: that effect will be spoiled by a single multibyte header string. This is regardless of the header actually containing multibyte characters: while an Emacs string literal containing only simple characters will be unibyte, an API key fetched from an external source will often be multibyte, e.g. ‘shell-command-to-string’.

Example:

(dolist (x (list
            "x"
            (shell-command-to-string "printf x")
            (encode-coding-string (shell-command-to-string "printf x") 'utf-8)))
  (let ((s (concat x (encode-coding-string "é" 'utf-8))))
    (message
     "%S: %s(%s) %s, %s"
     s
     (multibyte-string-p s)
     (multibyte-string-p x)
     (string-bytes s)
     (length s))))

Output:

"x\303\251": nil(nil) 3, 3
"x\303\251": t(t) 5, 3
"x\303\251": nil(nil) 3, 3

And:

(multibyte-string-p "foo") ; NIL
(multibyte-string-p "fôo") ; T
hraban commented 5 months ago

For context: calls are broken when:

ahyatt commented 5 months ago

Thank you for the fix! I had to fix a similar bug a while ago - I should definitely add tests for multibyte strings so we don't run into this again.