clojure-lsp / clojure-lsp

Clojure & ClojureScript Language Server (LSP) implementation
https://clojure-lsp.io
MIT License
1.17k stars 153 forks source link

clojure-lsp stuck (or crashed) in "starting" state in emacs #1197

Closed maddenp closed 2 years ago

maddenp commented 2 years ago

Describe the bug When I open a .clj file, clojure-lsp tries to start but continually says e.g. LSP[clojure-lsp:27231/starting] in the modeline, and *clojure-lsp::stderr* shows

WARNING: Issue found in NotificationMessage: Message could not be parsed.
com.google.gson.JsonSyntaxException: java.lang.IllegalStateException: Expected an int but was BEGIN_OBJECT at line 1 column 63 path $.params.processId
    at com.google.gson.Gson.fromJson(Gson.java:1003)
    at org.eclipse.lsp4j.adapters.InitializeParamsTypeAdapter.readProcessId(InitializeParamsTypeAdapter.java:139)
    at org.eclipse.lsp4j.adapters.InitializeParamsTypeAdapter.read(InitializeParamsTypeAdapter.java:97)
    at org.eclipse.lsp4j.adapters.InitializeParamsTypeAdapter.read(InitializeParamsTypeAdapter.java:34)
    at com.google.gson.TypeAdapter$1.read(TypeAdapter.java:199)
    at com.google.gson.Gson.fromJson(Gson.java:991)
    at org.eclipse.lsp4j.jsonrpc.json.adapters.MessageTypeAdapter.fromJson(MessageTypeAdapter.java:329)
    at org.eclipse.lsp4j.jsonrpc.json.adapters.MessageTypeAdapter.parseParams(MessageTypeAdapter.java:249)
    at org.eclipse.lsp4j.jsonrpc.json.adapters.MessageTypeAdapter.read(MessageTypeAdapter.java:119)
    at org.eclipse.lsp4j.jsonrpc.json.adapters.MessageTypeAdapter.read(MessageTypeAdapter.java:55)
    at com.google.gson.Gson.fromJson(Gson.java:991)
    at org.eclipse.lsp4j.jsonrpc.json.MessageJsonHandler.parseMessage(MessageJsonHandler.java:119)
    at org.eclipse.lsp4j.jsonrpc.json.MessageJsonHandler.parseMessage(MessageJsonHandler.java:114)
    at org.eclipse.lsp4j.jsonrpc.json.StreamMessageProducer.handleMessage(StreamMessageProducer.java:193)
    at org.eclipse.lsp4j.jsonrpc.json.StreamMessageProducer.listen(StreamMessageProducer.java:94)
    at org.eclipse.lsp4j.jsonrpc.json.ConcurrentMessageProcessor.run(ConcurrentMessageProcessor.java:113)
    at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:515)
    at java.util.concurrent.FutureTask.run(FutureTask.java:264)
    at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
    at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
    at java.lang.Thread.run(Thread.java:829)
    at com.oracle.svm.core.thread.JavaThreads.threadStartRoutine(JavaThreads.java:597)
    at com.oracle.svm.core.posix.thread.PosixJavaThreads.pthreadStartRoutine(PosixJavaThreads.java:194)
Caused by: java.lang.IllegalStateException: Expected an int but was BEGIN_OBJECT at line 1 column 63 path $.params.processId
    at com.google.gson.stream.JsonReader.nextInt(JsonReader.java:1197)
    at com.google.gson.internal.bind.TypeAdapters$7.read(TypeAdapters.java:240)
    at com.google.gson.internal.bind.TypeAdapters$7.read(TypeAdapters.java:232)
    at com.google.gson.Gson.fromJson(Gson.java:991)
    ... 22 more

To Reproduce I'm not sure. I have three systems, two on which this issue reliably occurs, and one on which I cannot make it happen at all. All are using emacs 28.1 that I have built myself, all with identical init.el files, all using clojure-lsp executables with identical versions (downloaded automatically by emacs, not separately installed -- though I've also tried pointing to a manually-downloaded clojure-lsp, to no avail, on the problematic systems).

I made some progress using lsp-start-plain.el from https://emacs-lsp.github.io/lsp-mode/page/troubleshooting/: clojure-lsp started correctly then, so I started with a blank init.el and started building my init.el back up, adding a little more, restarting emacs and confirming that clojure-lsp still worked correctly, adding more, testing, etc., until my init.el had everything back that I wanted. But then I wanted to be sure I was on solid ground, so removed everything in my ~/.emacs.d except for init.el, started emacs, let all the packages download/compile, loaded a .clj file, and the stuck-at-starting issue was back. I wondered if different (and maybe incompatible) packages were being installed when emacs started as a blank slate vs when I added packages a couple/few at a time, but the list of installed packages shown by package-list-packages is identical between my problematic system and the one that works fine.

Maybe this isn't even a bug, or not a clojure-lsp bug, bug I'm at a loss as to what to look at next, or the meaning of the stacktrace shown above, and would be grateful for any advice. It was so nice for the day that I had clojure-lsp working! I'm happy to provide any needed follow-up information.

Expected behavior clojure-lsp starts correctly.

Log - client <-> server
Params: {
  "processId": null,
  "rootPath": "/home/pmadden/git/gencfn",
  "clientInfo": {
    "name": "emacs",
    "version": "GNU Emacs 28.1 (build 1, x86_64-pc-linux-gnu, X toolkit, Xaw scroll bars)\n of 2022-08-09"
  },
  "rootUri": "file:///home/pmadden/git/gencfn",
  "capabilities": {
    "workspace": {
      "workspaceEdit": {
        "documentChanges": true,
        "resourceOperations": [
          "create",
          "rename",
          "delete"
        ]
      },
      "applyEdit": true,
      "symbol": {
        "symbolKind": {
          "valueSet": [
            1,
            2,
            3,
            4,
            5,
            6,
            7,
            8,
            9,
            10,
            11,
            12,
            13,
            14,
            15,
            16,
            17,
            18,
            19,
            20,
            21,
            22,
            23,
            24,
            25,
            26
          ]
        },
        "hierarchicalDocumentSymbolSupport": true
      },
      "formatting": {
        "dynamicRegistration": true
      },
      "rangeFormatting": {
        "dynamicRegistration": true
      },
      "rename": {
        "dynamicRegistration": true,
        "prepareSupport": true
      },
      "codeAction": {
        "dynamicRegistration": true,
        "isPreferredSupport": true,
        "codeActionLiteralSupport": {
          "codeActionKind": {
            "valueSet": [
              "",
              "quickfix",
              "refactor",
              "refactor.extract",
              "refactor.inline",
              "refactor.rewrite",
              "source",
              "source.organizeImports"
            ]
          }
        },
        "resolveSupport": {
          "properties": [
            "edit",
            "command"
          ]
        },
        "dataSupport": true
      },
      "completion": {
        "completionItem": {
          "snippetSupport": false,
          "documentationFormat": [
            "markdown",
            "plaintext"
          ],
          "resolveAdditionalTextEditsSupport": true,
          "insertReplaceSupport": true,
          "deprecatedSupport": true,
          "resolveSupport": {
            "properties": [
              "documentation",
              "detail",
              "additionalTextEdits",
              "command"
            ]
          },
          "insertTextModeSupport": {
            "valueSet": [
              1,
              2
            ]
          }
        },
        "contextSupport": true
      },
      "signatureHelp": {
        "signatureInformation": {
          "parameterInformation": {
            "labelOffsetSupport": true
          }
        }
      },
      "documentLink": {
        "dynamicRegistration": true,
        "tooltipSupport": true
      },
      "hover": {
        "contentFormat": [
          "markdown",
          "plaintext"
        ]
      },
      "foldingRange": {
        "dynamicRegistration": true
      },
      "callHierarchy": {
        "dynamicRegistration": false
      },
      "publishDiagnostics": {
        "relatedInformation": true,
        "tagSupport": {
          "valueSet": [
            1,
            2
          ]
        },
        "versionSupport": true
      },
      "linkedEditingRange": {
        "dynamicRegistration": true
      }
    },
    "window": {
      "workDoneProgress": true,
      "showDocument": {
        "support": true
      }
    },
    "experimental": {
      "testTree": true
    }
  },
  "initializationOptions": {
    "dependency-scheme": "jar",
    "show-docs-arity-on-same-line?": true
  },
  "workDoneToken": "1"
}
Log - clojure-lsp
Found the following clients for /home/pmadden/git/gencfn/src/gencfn/stacks/srosmon.clj: (server-id clojure-lsp, priority 0)
The following clients were selected based on priority: (server-id clojure-lsp, priority 0)

User details (please complete the following information):

ericdallo commented 2 years ago

@maddenp thanks for the report, what is happening is that the client (lsp-mode) should send the initialize request with processId as a int or null, from your client logs, it seems it's null which is ok, but from server side logs, it's trying to parse a object ({}) which is weird.

I'm not sure if we should spend that much time since we removed lsp4j integration on master and it should be available on next release, so it'd be nice if you could test with a nightly clojure-lsp build, from here, WDYT?

c/c @mainej in case we have the same issue on lsp4clj even with lsp4j removal

maddenp commented 2 years ago

Hi @ericdallo, thanks for the fast response! I can confirm replacing the emacs-downloaded clojure-lsp executable with the one from https://github.com/clojure-lsp/clojure-lsp-dev-builds/releases/tag/2022.08.16-19.32.20-nightly immediately solved the problem for me.

maddenp commented 2 years ago

Hmm, a few seconds after starting, though, I do get e.g.

Server clojure-lsp:28782 exited with status exit(check corresponding stderr buffer for details). Do you want to restart it? (y or n)

which repeats if I answer y. I don't see anything interesting in the lsp-related buffers, and in the stderr log just

Process clojure-lsp stderr finished
maddenp commented 2 years ago

Anything I could try/check that might be helpful to you? If not, I'm happy to wait for a proper release.

ericdallo commented 2 years ago

The release will be eventually just a promotion of a nightly to a release, so if your issue was not fixed on master/nightly, it won't be fixed on next release 😅

That kind of crash is bad, but we should have some log regarding that somewhere

maddenp commented 2 years ago

From the end of /tmp/clojure-lsp.*.out:

2022-08-16T21:58:14.459Z  INFO [clojure-lsp.server:38] - Liveness probe - Parent process {} is not running - exiting server
2022-08-16T21:58:14.459Z  INFO [clojure-lsp.server:409] - Exiting...

Interesting?

mainej commented 2 years ago

It looks like both builds ended up parsing processId as an empty JSON object {}. The "old" version of lsp4clj, which uses lsp4j to parse JSON failed with Expected an int but was BEGIN_OBJECT. And the "new" version of lsp4clj, which uses cheshire to parse JSON failed with Parent process {} is not running where {} is usually an integer.

That contradicts the client-side tracing, which says that emacs-lsp sent "processId": null. So I can think of two possibilities.

First, the problem could be on the clojure-lsp side, the server side. Perhaps it is receiving null but turning it into {} somewhere, instead of into nil. I don't see where this could be happening, but I won't rule it out.

Second, and what I think is more likely, is that emacs-lsp is using two different libraries (or code paths), one to pretty print the traces, and another to serialize the object and send it to clojure-lsp. The library that does the second part could be serializing the (ELisp) nil into (JSON) {} instead of null.

We could try to verify that with a build of clojure-lsp that turns on server-side tracing. That would add traces to the server logs too, although they would be traces not of the JSON the server gets on the wire, but of the object it has immediately after cheshire has parsed the JSON. Assuming those traces do suggest clojure-lsp is receiving "processId": {}, I'm not sure what we would do with that information. We could certainly make the code that handles :process-id more defensive, but if all nulls are converted to objects, we'll have problems all over the place. So I think all it would do is rule out clojure-lsp as the problem and shift the blame over to emacs-lsp, or whatever it is in your emacs environment that's doing JSON serialization.

Sorry—this isn't a very satisfying answer. Would you like to try turning on server-side tracing @maddenp? Have you ever built clojure-lsp before or would you want guidance?

maddenp commented 2 years ago

Hi @mainej, and thank you. I followed the instructions at https://clojure-lsp.io/building/ and was able to make debug-cli, but the resulting clojure-lsp doesn't seem to offer anything extra in the server log, to my untrained eye, at least. Should I build from a branch other than master, or follow a different procedure to build an executable with tracing enabled? I agree that it would be worth exonerating clojure-lsp.

ericdallo commented 2 years ago

Yes, it might indicate a bug on lsp4clj recent changes where we stopped using lsp4j @mainej, maybe we are still probing even if client didn't send a processId?

mainej commented 2 years ago

Sorry @maddenp, I should have explained... you have to edit this line, changing :trace? false to :trace? true. That will turn on server-side tracing. Otherwise, you followed the right procedure. When you have a chance to try again, let me know how it goes.

@ericdallo it looks to me like the client is sending a processId, but it's sending a map instead of null or an integer. By my reading of the traces and logs, that's happening both before and after the removal of lsp4j. The old version blows up a little earlier—the Java JSON parser explodes when it finds a map where it was expecting to find a null or an integer. It's type checking the request object. In the new version the Cheshire JSON parser doesn't do any type checking. If it sees a JSON map, it parses it into a Clojure map, and passes that to the rest of the code. Later on we try to probe for a running process whose id is {} (assuming that any non-null value is an integer) and things go bad. So, same problem, different symptoms. We'll verify whether the JSON has a map in it when @maddenp has a chance to trace what the server sees as input.

maddenp commented 2 years ago

@mainej That worked, thanks. I'm guessing this is the block from /tmp/clojure-lsp.* you're looking for, but please let me know if you need more / different:

2022-08-17T20:52:25.800Z  INFO [clojure-lsp.server:523] - [SERVER] Starting server...
2022-08-17T20:52:27.343Z  INFO [clojure-lsp.nrepl:18] - ====== LSP nrepl server started on port 46723
2022-08-17T20:52:27.370Z  INFO [clojure-lsp.server:415] - Initializing...
2022-08-17T20:52:27.371Z  DEBUG [clojure-lsp.server:38] - [Trace - 2022-08-17T20:52:27.362Z] Received request 'initialize - (1)'
Params: {
  "process-id" : { },
  "root-path" : "/home/pmadden/git/gencfn",
  "client-info" : {
    "name" : "emacs",
    "version" : "GNU Emacs 28.1 (build 1, x86_64-pc-linux-gnu, X toolkit, Xaw scroll bars)\n of 2022-08-09"
  },
...

By way of comparison, I copied this clojure-lsp executable to the system where I'm NOT having this problem, and the corresponding block looks like

2022-08-17T20:56:13.505Z  INFO [clojure-lsp.server:523] - [SERVER] Starting server...
2022-08-17T20:56:14.881Z  INFO [clojure-lsp.nrepl:18] - ====== LSP nrepl server started on port 44871
2022-08-17T20:56:14.939Z  INFO [clojure-lsp.server:415] - Initializing...
2022-08-17T20:56:14.940Z  DEBUG [clojure-lsp.server:38] - [Trace - 2022-08-17T20:56:14.905Z] Received request 'initialize - (1)'
Params: {
  "process-id" : null,
  "root-path" : "/users/PYS0343/pmadden/git/gencfn",
  "client-info" : {
    "name" : "emacs",
    "version" : "GNU Emacs 28.1 (build 1, x86_64-pc-linux-gnu, GTK+ Version 3.22.30, cairo version 1.15.12)\n of 2022-07-23"
  },
...

Am I right in thinking that this supports your hypothesis that it's the client that's sending different data to clojure-lsp on the different systems and triggering this problem? I wonder if this is down to something emacs is linking against.

maddenp commented 2 years ago

I checked the dynamic libraries required by emacs on the correctly functioning system and found

% readelf -d $(which emacs) | grep NEEDED
...
 0x0000000000000001 (NEEDED)             Shared library: [libjansson.so.4]
...

Suspecting the usual free-software pun, I checked and, indeed, this is a JSON library. On my broken system my emacs executable did not link to this library, so I installed libjansson-dev from my OS package manager and rebuilt emacs with the --with-json flag, which is described in ./configure --help as

--with-json             compile with native JSON support

With this new build, clojure-lsp starts correctly and does not prematurely exit.

mainej commented 2 years ago

@maddenp nice debugging!

@ericdallo you'd have a better sense–does this belong in the troubleshooting docs for lsp-mode?

ericdallo commented 2 years ago

Nice debugging indeed! @mainej yes, definitely something to be on lsp-mode torubleshooting, which can happen with any server, if we can't fix it on lsp-mode somehow we could try to fix it on server side but wouldn't be an ideal solution.

@maddenp Could you open a issue on lsp-mode linking this one and explaining your findings regarding the jansson lib? Or a PR if you want to improve troubleshooting.

I'm closing this as it doesn't seem to be a issue to be fixed on server side for now, thanks for the help @mainej

maddenp commented 2 years ago

Yes, I'll open an issue on lsp-mode. Thanks for your help with the current issue.

maddenp commented 2 years ago

What appears to be essentially the same issue was already reported in https://github.com/emacs-lsp/lsp-mode/issues/3629. The issue was closed but there doesn't seem to be a definitive solution apart from installing libjansson-dev. I'll link that issue to this one in case it's helpful to anyone finding it. I'm afraid what I found here doesn't add anything new that would warrant opening a new issue, but please let me know if you disagree.