emacs-lsp / lsp-mode

Emacs client/library for the Language Server Protocol
https://emacs-lsp.github.io/lsp-mode
GNU General Public License v3.0
4.79k stars 888 forks source link

Strange character in file causes invalid JSON to be sent to LSP server #1246

Open rlivings39 opened 4 years ago

rlivings39 commented 4 years ago

Describe the bug When a document contains the character with codepoint #x3fff85 the LSP server receives invalid JSON or errors occur in Emacs.

Note that using the same C++ file with clangd in VSCode doesn't encounter any issues.

To Reproduce with clangd Make a small test C++ file in the lsp-docker docker:

// Comment with funny character:                                                                                                                                             
#include <vector>
int main() {

  return 0;
}

Then start lsp: M-x lsp with clangd as the language server (it's part of the docker image). Notice that you can get completions:

// Comment with funny character:                                                                                                                                             
#include <vector>
int main() {
  std::<TAB>
  return 0;
}

shows completions brought in by #include <vector>. Now move your cursor right after the : in the comment on the first line and do: M-x insert-char RET #x3fff85 RET to insert the character we happened to have in our files.

Then try to save the cpp file. When doing so I see errors like:

rror on `lsp--text-document-did-save: (wrong-type-argument utf-8-string-p "// Comment with funny character: \205
#include <vector>
int main() {

  return 0;
}

")'
LSP :: onCodeAction called for non-added file [3 times]
LSP :: trying to get AST for non-added document [3 times]
LSP :: onCodeAction called for non-added file [2 times]
LSP :: trying to get preamble for non-added document
Mark set
Saving file /test.cpp...
Wrote /test.cpp
Error on `lsp--text-document-did-save: (wrong-type-argument utf-8-string-p "// Comment with funny character: \205
#include <vector>
int main() {

  return 0;
}

")'
LSP :: onCodeAction called for non-added file [2 times]
Saving file /test.cpp...
Wrote /test.cpp
Error on `lsp--text-document-did-save: (wrong-type-argument utf-8-string-p "// Comment with funny character: \205
#include <vector>
int main() {

  return 0;
}

")'

in the *Messages* buffer.

To Reproduce with lsp-sample LSP server When using an LSP server derived from the Microsoft LSP framework, this results in a JSON parse error:

undefined:2
C
^

SyntaxError: Unexpected token C in JSON at position 166
    at JSON.parse (<anonymous>)
    at StreamMessageReader.onData (/local-ssd/rlivings/vscode-extension-samples/lsp-sample/server/node_modules/vscode-jsonrpc/lib/messageReader.js:182:29)
    at Socket.readable.on (/local-ssd/rlivings/vscode-extension-samples/lsp-sample/server/node_modules/vscode-jsonrpc/lib/messageReader.js:148:18)
    at Socket.emit (events.js:198:13)
    at addChunk (_stream_readable.js:287:12)
    at readableAddChunk (_stream_readable.js:268:11)
    at Socket.Readable.push (_stream_readable.js:223:10)
    at Pipe.onStreamRead [as onread] (internal/stream_base_commons.js:94:17)

Process lsp-sample stderr finished

To reproduce that:

git clone git@github.com:rlivings39/vscode-extension-samples.git
cd vscode-extension-samples/lsp-sample
npm install
npm run compile

Edit lsp-sample.el to point to server/out/server.js in your clone. Then

  1. Open Emacs
  2. M-x load-file /path/to/lsp-sample.el
  3. Open any .txt file, insert the bad character as above M-x insert-char RET #x3fff85 RET, and start lsp-mode
  4. Notice the error in *lsp-sample::stderr*

Expected behavior No errors with strange characters.

Which Language Server did you use clangd with C++

OS Debian 9

Error callstack

Debugger entered--Lisp error: (wrong-type-argument utf-8-string-p "// Comment with funny character: \205  \n#include <vec...")
  json-serialize((:jsonrpc "2.0" :method "textDocument/didSave" :params (:textDocument (:uri "file:///test.cpp" :version 169) :text "// Comment with funny character: \205  \n#include <vec...")) :null-object nil$
  (let ((body (json-serialize params :null-object nil :false-object :json-false))) (concat "Content-Length: " (number-to-string (1+ (string-bytes body))) "\15\n\15\n" body "\n"))
  lsp--make-message((:jsonrpc "2.0" :method "textDocument/didSave" :params (:textDocument (:uri "file:///test.cpp" :version 169) :text "// Comment with funny character: \205  \n#include <vec...")))
  (lsp--send-no-wait (lsp--make-message body) (progn (or (and (memq (type-of lsp--cur-workspace) cl-struct-lsp--workspace-tags) t) (signal 'wrong-type-argument (list 'lsp--workspace lsp--cur-workspace))) (aref $
  (let ((lsp--cur-workspace it)) (if lsp-print-io (progn (lsp--log-entry-new (lsp--make-log-entry (plist-get body :method) nil (plist-get body :params) 'outgoing-notif) lsp--cur-workspace))) (lsp--send-no-wait $
  (closure ((body :jsonrpc "2.0" :method "textDocument/didSave" :params (:textDocument (:uri "file:///test.cpp" :version 169) :text "// Comment with funny character: \205  \n#include <vec...")) cl-struct-lsp--l$
  mapcar((closure ((body :jsonrpc "2.0" :method "textDocument/didSave" :params (:textDocument (:uri "file:///test.cpp" :version 169) :text "// Comment with funny character: \205  \n#include <vec...")) cl-struct$
  lsp--send-notification((:jsonrpc "2.0" :method "textDocument/didSave" :params (:textDocument (:uri "file:///test.cpp" :version 169) :text "// Comment with funny character: \205  \n#include <vec...")))
  lsp-notify("textDocument/didSave" (:textDocument (:uri "file:///test.cpp" :version 169) :text "// Comment with funny character: \205  \n#include <vec..."))
  (condition-case err (lsp-notify "textDocument/didSave" (list ':textDocument (lsp--versioned-text-document-identifier) ':text (if (lsp--save-include-text-p) (save-excursion (widen) (buffer-substring-no-propert$
  (progn (condition-case err (lsp-notify "textDocument/didSave" (list ':textDocument (lsp--versioned-text-document-identifier) ':text (if (lsp--save-include-text-p) (save-excursion (widen) (buffer-substring-no-$
  (if (lsp--send-did-save-p) (progn (condition-case err (lsp-notify "textDocument/didSave" (list ':textDocument (lsp--versioned-text-document-identifier) ':text (if (lsp--save-include-text-p) (save-excursion (w$
  lsp-on-save()
  run-hooks(after-save-hook)
  basic-save-buffer(t)
  save-buffer(1)
  funcall-interactively(save-buffer 1)
  call-interactively(save-buffer nil nil)
  command-execute(save-buffer)

lsp-log.txt lsp-log-clangd.txt lsp-log-clangd-stderr.txt

yyoncho commented 4 years ago

Can you point me to lsp-sample.el ?

rlivings39 commented 4 years ago

It should be under vscode-extension-samples/lsp-sample/ in the repo you cloned git@github.com:rlivings39/vscode-extension-samples.git

https://github.com/rlivings39/vscode-extension-samples/blob/master/lsp-sample/lsp-sample.el

yyoncho commented 4 years ago

Thank you. There seem to be two issues:

  1. json-serialize cannot handle that char.
(json-serialize ["�"]) ;; fails 
(json-encode ["�"])  ;; works fine

The first one fails and the second does not.

@eli-zaretskii can you comment whether this is something that has to be fixed in json-serialize?

  1. The sample lsp server does not work with codepoint - I tested locally with elisp json parser/serializer and it works fine with jdt-ls and clangd but fails with that sample server.

@rlivings39 can you change the definition of lsp--json-serialize and verify that the scenario works fine when using clangd(note that this is a macro and you have to evaluate the whole lsp-mode buffer).

(defmacro lsp--json-serialize (params)
  `(let ((json-false :json-false))
     (json-encode ,params)))
Eli-Zaretskii commented 4 years ago

(json-serialize ["�"]) ;; fails

Do you mean the wrong-type-argument error? AFAIU, that's expected, because #x3fff85 is not a Unicode codepoint. The author of json.c had very strong views regarding signaling an error in these cases, and you see that in action.

How did that character end up in JSON data?

Eli-Zaretskii commented 4 years ago

Btw, we have the capability to automatically convert such invalid characters to U+FFFD, which I believe is TRT in these cases anyway, since the underlying JSON library will be unable to handle any non-Unicode characters anyway. But I need the agreement of json.c author to make such a change, so if you think this is desired, please report an Emacs bug, and we will discuss this with the json.c author.

Eli-Zaretskii commented 4 years ago

we have the capability to automatically convert such invalid characters to U+FFFD

Alternatively, we can skip them. It's all a matter of a decision.

What do other similar processors do with such characters?

rlivings39 commented 4 years ago

@yyoncho clangd seems to work with that change. I still see the issue with lsp-sample

yyoncho commented 4 years ago

we have the capability to automatically convert such invalid characters to U+FFFD

Alternatively, we can skip them. It's all a matter of a decision.

What do other similar processors do with such characters?

@MaskRay can you advise? I don't have expertise on the matter.

rlivings39 commented 4 years ago

@yyoncho I had an exchange on the node language server library repo regarding the same issue:

https://github.com/microsoft/vscode-languageserver-node/issues/643

in case any of that information may help here.