Replace lsp4j with internal communication logic

Discussing with @mainej, we realized that lsp4j is not helping that much lsp4clj or clojure-lsp.

The main reason we use it IMO, it's because lsp4j implements the json parsing and stdout/stderr LSP jsonrpc communication layer, making the client just need to implement an interface handling the request methods and notifications easily. But lsp4j has some drawbacks like:

it uses a lot of reflection on the backend, making it harder to work with Graalvm, since Graal requires to tell what classes exist and what it's used.
Unnecessary dependencies like xtend, guava, gson, and others.
we already have a coercion logic on our side, so having the POJO classes is not needed actually if we implement the communication in Clojure
We can't customize how async is handled, lsp4j just gives us the CompletableFuture for each request method being handled, we can't customize to use clojure.core.async or anything custom
It's easy to cause a bug on the communication layer, as we could see on https://github.com/clojure-lsp/clojure-lsp/issues/1080 after we accidentally printed to stdout and the server stopped working.

I think we could think about how we can achieve the same lsp4j offers to us using only Clojure

Thanks for starting this thread @ericdallo. I've been thinking about some of the details. Here's an enumeration of the features that lsp4j provides, or at least the features I'm aware of. I'll discuss each one below, adding commentary about problems with lsp4j and notes about re-implementing.

JSON-RPC protocol handling
De-serialization of JSON into Java class instances
Distribution of messages to a threadpool of workers
Processing of messages via Service class instances
Serialization of Java class instances into JSON
Bi-directional client->server and server->client communication
Sequencing of server->client requests
Request cancellation

JSON-RPC protocol handling

JSON-RPC itself is quite simple, fitting into a page-long spec.

The LSP flavor of JSON-RPC is more elaborate, specifying details of the message headers, and how the messages are to be separated from each other by newline characters.

lsp4j handles this layer, reading and writing messages in this format. It can read and write from a pair of stdio streams, or through a socket. lsp4clj servers only uses the stdio streams (I think), although #1 requests that we support sockets.

I'm not sure whether lsp4j validates the messages' format. That is, if it were waiting to receive a header and received something else, does it discard the invalid line, throw an error, or something else? What if a header is partially valid (for example, if it specifies an encoding of utf-16, which isn't allowed by the LSP spec)?

commentary

The fact that LSP specifies this layer is a severe limitation. It means that we can't send messages back and forth using standard HTTP servers and clients, because they all expect to speak the HTTP protocol. That is, they expect to have the HTTP/1.1 line, among other things, which LSP doesn't support.

So, I think we'd have to do our own reading/writing of messages as lsp4j does, straight off of sockets or stdio streams. Maybe this isn't too terribly difficult, but it's not something I've ever tried.

De-serialization of JSON

lsp4j de-serializes the JSON. It captures the JSON-RPC method, params and id (if present).

It turns the params into an instance of a Params class, which in turn contains references to other instances. The structure of this instance tree mirrors the structure of the data defined in the LSP spec.

lsp4j then invokes a method on a Service object corresponding to the JSON-RPC method, passing the Params instance.

It keeps track of the JSON-RPC id. See Request cancellation and Response Serialization below for how the id is used.

commentary

This is the one of the worst parts of lsp4j, from a lsp4clj perspective. It is probably useful if you have a lot of tooling that needs static classes. But it gets in our way. The Clojure community has developed much nicer idioms for these responsibilities.

As @ericdallo described, we spend unneeded time casting to and from the Params and Response classes. Mistakes in configuring the reflection cause weird problems in Graal. Worse, because it's easy to cause errors by accidentally "debeaning" too deeply, we only ever debean to the level we need to. This leaves parts of deeply nested structures (like client capabilities) half Clojure and half Java. When one of the servers needs some Java data to be coerced to Clojure, we have to bump lsp4clj.

In my vision, de-serialization would be done by Cheshire (or charred if we want to remove the Jackson dep). camel-snake-kebab would convert all the keys to kebab-case keywords. This would get us Clojure primitives and associative data structures nested all the way down, with hashmaps whose keys are Clojure-friendly. More importantly the LSP spec would be the authority on what keys were available, and we wouldn't have to spend any time adding new keys when the spec was updated (unless we wanted to clojure.spec them to check that clients were implementing the LSP spec correctly) or when one server needed different data. We could remove a bunch of coercion, and not fear that keys or values would be slightly different from the LSP spec in Graal because of reflection.

Distribution of messages

When lsp4j invokes the Service method, it does so in a thread pool. If many messages come in simultaneously, they each get assigned to their own worker thread.

I don't know how many threads it uses.

commentary

This is another low-level responsibility that might be trickier than we think. We could certainly distribute method processing to a thread pool. But what happens if some request starts taking a long time and the thread pool fills up? What if a thread blocks too long? Do we need timeouts? Might be good to study how lsp4j does this.

Processing of messages

Processing is mostly out of lsp4j's hands.

lsp4clj steps in at this point. It provides implementations of the Service interfaces (though clojure-lsp and other servers provide much of the details). lsp4clj coerces the Params objects into Clojure hashmaps, validating that they have the format specified by the LSP spec.

The servers (e.g. clojure-lsp) then do the work of generating responses, returning their responses as hashmaps. lsp4clj then coerces those hashmaps back into Java instances, and passes them back to lsp4j.

commentary

A big reason that lsp4clj exists is simply to implement the Service interfaces. But if we didn't have to do that, there'd be much less for lsp4clj to do.

After de-serializing, we'd have an id, method name, and params map, according to the JSON-RPC spec. We'd need to dispatch based on the method name. Clojure has good tools for this.

(defmulti handle-notification (fn [method _params] (->kebab-case-keyword method)))

(defmethod handle-notification :default [method _] (error-responses/undefined-method method))

We could leave it up to the individual servers to implement each method. This would let each server grow to implement the methods it felt were important for its users. For example, at the moment lsp4clj doesn't support textDocument/willSave because clojure-lsp doesn't need it. But with defmethod extension, any server could write the following and have it work.

(defmethod lsp4clj/handle-notification :text-document/will-save ...)`

Less ideally, lsp4clj could do what it does now—provide a protocol which the servers must implement and call into this protocol based on the method string. This would force updates to lsp4clj whenever one server needed to implement a new method.

Response Serialization

lsp4j takes this response data and reattaches the request id, as required by JSON-RPC. It then serializes the full response into a JSON message. Then it generates the appropriate headers and writes this message back to the client.

If there was an error, I believe lsp4clj notices that and adjust the response as appropriate, but perhaps that's an lsp4j responsibility.

commentary

It's a good idea to have a helper grab the id and re-attach it after getting a response, instead of trying to thread the id through the method processor. But Clojure is good at that kind of thing—a ring-style handler is a good model. Grab an id out of a request, pass the request to something else which will generate a response, then reattach the request id to the response.

As with de-serialization, serialization of the messages would be easy, using one of the Clojure JSON libraries. We would probably want to do a little conformation, as we do now, to ensure the responses are in the right format.

Writing the response back out to the stream with the right headers would be a little trickier, as with reading from the stream, but nothing too hard.

Bi-directionality

JSON-RPC is typically uni-directional. A client sends requests and notifications to a server, which responds appropriately.

But LSP is bi-directional. Both the client and the server can initiate requests and notifications, and both respond.

lsp4j allows the server to participate in this bi-directionality, making it able to both receive and send messages.

commentary

lsp4j makes a big deal out of modeling bi-directionality. Because of the symmetry they model both the client and the server the same way, as Endpoints. This may be useful if you need to write both clients and servers, but we don't. We're writing servers.

Yes, both directions speak JSON-RPC. We might want some shared helper methods related to JSON-RPC. But we don't need that to be a central feature of our data model.

In particular, when acting as senders of requests, it may be useful to servers to have these requests modeled as CompletableFutures so they can easily be sequenced (see below). But as receivers of requests, servers should be free to use whatever concurrency constructs they like (keeping in mind caveats about message ordering).

They shouldn't have to use CompletableFutures just because that was good for the symmetrical use-case.

Sequencing of requests

When the server sends requests, it sometimes needs to know that the client has processed one request before it sends a second request.

lsp4j models the requests as CompletableFutures. Servers can take advantage of this to sequence their requests. A server can wait on a future (deref it) before initiating a second request. But, it's up to clojure-lsp (or another server) to arrange to funnel all of its requests through a single thread, so that the derefs stack up behind each other.

commentary

Clojure-based HTTP clients often model requests as derefable objects too, so users can decide whether to await a response. lsp4clj might not be able to use an HTTP client, but we can use the same basic idea.

Request cancellation

lsp4j also models incoming requests as CompletableFutures. It keeps a cache of these futures, keyed by id. When the server returns a response, lsp4clj signals this to lsp4j by fulfilling the future. At this point lsp4j takes the future out of its cache.

If at some point while the server is processing the request, the client asks lsp4j to cancel it (by providing a request id), lsp4j looks up the future in the cache, and if it exists, tries to cancel it.

commentary

The lsp4j futures are deceptively named CompletableFutures, but they are not java.util.concurrent.CompletableFuture instances. They are similar, but have special machinery which allows them to be canceled. The method processing code has to participate in this, by periodically checking whether it has been cancelled.

None of our message processors check whether they've been canceled, so we don't take advantage of lsp4j's cancellation.

But it's interesting to think about what we'd need to do to support this feature at the lsp4clj layer. We'd need something similar—a stateful cache of requests indexed by request id. We may have to require that servers participate in the java.util.concurrent.Future interface, so that requests can be cancelled by future-cancel. I don't understand why lsp4j didn't take this same approach, since the java.util.concurrent.Future interface (and therefore java.util.concurrent.CompletableFuture) includes a cancel method.

Maybe there's a reason, for example so that method processors have control over when they are cancelled, and don't get cancelled when they are halfway done with something. We'd have to think about that if we added cancellation support.

Final thoughts

I realize I'm describing a lot of breaking changes. Servers would have to re-write sections of their initialization and method processing logic. They'd have to rip out anything that used the lsp4j classes, or worked around incompletely de-serialized data. But, I think after the re-implementation, the servers would be simpler. And we'd have a more attractive base for writing LSP servers in Clojure.

This may be more of a second project than a "v2" of lsp4clj. In fact, it may be easier to start this work in clojure-lsp, implementing it only there at first. If we did it that way, we'd want to keep in mind the possibility of extracting it, as happened with lsp4clj.

Thanks for the detailed information, I mostly agree with all points.

Distribution of messages

This is another low-level responsibility that might be trickier than we think. We could certainly distribute method processing to a thread pool. But what happens if some request starts taking a long time and the thread pool fills up? What if a thread blocks too long? Do we need timeouts? Might be good to study how lsp4j does this.

IIRC, lsp4j uses a cachedExecutorService, I was thinking if we could use clojure.core.async for that, something to research.

Request cancellation

agreed, but I suggest we don't spend too much time thinking about that as we never needed on clojure-lsp, so probably other clients are not interested yet, I'd classify it as a new future feature

I realize I'm describing a lot of breaking changes. Servers would have to re-write sections of their initialization and method processing logic. They'd have to rip out anything that used the lsp4j classes, or worked around incompletely de-serialized data. But, I think after the re-implementation, the servers would be simpler. And we'd have a more attractive base for writing LSP servers in Clojure.

Yes, I think it's acceptable to have a new major version that offers a new API using stdio yet, clients of lsp4clj would need to do some changes but probably not that hard for them.

This may be more of a second project than a "v2" of lsp4clj. In fact, it may be easier to start this work in clojure-lsp, implementing it only there at first. If we did it that way, we'd want to keep in mind the possibility of extracting it, as happened with lsp4clj.

I was thinking about starting on a v1 branch (as of now, lsp4clj is on alpha state yet) and keep improving it until we think it's good enough to merge to master and release the v1, also, keep fixing bugs/patchs on master branch until we merge v1.

Agreed, we can think about how to distribute work to threads. And also agreed, we shouldn't spend time thinking about cancellation yet.

starting on a v1 branch

Isn't it going to be tricky to exclude all the lsp4j dependencies? That is, we'd like to get guava, xtend, etc off the classpath, but they'll be there as long as the old code exists.

also, keep fixing bugs/patchs on master branch

Yep, makes sense. I expect other servers will continue to want patches even after v1 is merged.

Isn't it going to be tricky to exclude all the lsp4j dependencies? That is, we'd like to get guava, xtend, etc off the classpath, but they'll be there as long as the old code exists.

I mean, the master would still use lsp4j normally but the v1 branch would not have any lsp4j deps anymore, offering a studio approach via clojure only. The difference is that we can merge little PRs and break things on v1 until we think it's stable enough to merge to master after lots of tests. WDYT?

That could work. There's still a chance we'll have to provide patches for the lsp4j version, but unlikely. OK, I like it. 👍

I was being too fancy with kebab-casing the JSON-RPC method. The methods can have many "segments" (slashes) so it's best to leave them as strings:

;; lsp4clj
(defmulti receive-notification (fn [method _params] method))
(defmethod receive-notification :default [method _] (error-responses/undefined-method method))

;; server
(defmethod lsp4clj/receive-notification "textDocument/didChange" ...)
(defmethod lsp4clj/receive-notification "clojure/serverInfo/log" ...)

https://github.com/clojure-lsp/clojure-lsp/pull/1110 provides the basic tools to implement the first point in my list, JSON-RPC protocol handling. At the moment it only does stdio, but it establishes a pattern we can use for sockets someday. The pieces that I think are still missing are charset handling and conversion to kebab-case keywords.

That's one of the major hurdles (in my mind) out of the way.

I have a fuzzy idea for one of the other major hurdles—parallel processing of messages. Now that the messages streams are modeled as core.async channels, I think parallel processing can be accomplished with core.async/pipeline-blocking. The from and to channels will be the channels from https://github.com/clojure-lsp/clojure-lsp/pull/1110 and the xf will be the server's transformation of a request to a response.

I think this can all be tied together following the pattern we've established in clojure-lsp's integration.client, but implementing the server side instead of the client side.

Anyway, I'd like to get started on this. If you want to help, that'd be great. We can talk about how to divide up work. For now, I'll make a branch and start working there.

This is started in:

https://github.com/mainej/lsp4clj/tree/lsp2clj https://github.com/clojure-lsp/clojure-lsp/tree/lsp2clj

It compiles at the moment (as of https://github.com/mainej/lsp4clj/commit/5eb39ec4a3d5eb02dcc390747a5357f359e542fd and https://github.com/clojure-lsp/clojure-lsp/commit/be36dfbeca4dfba54ee7124fb880d4f5db4f527e). When you start the server, it receives the "initialize" request, and does ... nothing. This is actually good progress—we're receiving messages from the client.

On the lsp4clj side, we still need to do error handling—the spec is clear about certain error scenarios (invalid JSON, etc), and there are various points in the new lsp4clj code where things could go bad. We need to understand what happens when errors occur and figure out how to proceed. Right now, things mostly crash.

We also need to work out how to do conformation.

JSON input is already parsed into hashmaps with kebab-case keywords. My guess is that will be enough for most use cases. See the logs below for the fully parsed "initialize" request, including all the client capabilities. This is a big step forward from lsp4j alreaqdy.

Output also automatically serializes to JSON with camelCase strings for keys. There may be some output conformation to do. We can't really use any of the existing output coercers, because they're all bound up with coercion to Java objects. But I hope that most of the time we'll be able to generate a hashmap that has the right shape, and not need any coercion.

On the clojure-lsp side there's still lots to do too, starting obviously with making it respond to the initialize request. That'll be another big milestone.

There's also lots to do in terms of removing lsp4j specific code.

And finally, we'll have to do another exercise where we decide which code belongs on the lsp4cl side and which on the cloure-lsp side, with an eye to what other servers will want.

2022-07-08T04:53:51.809Z  INFO [clojure-lsp.server:100] - [SERVER] Starting server...
2022-07-08T04:53:53.455Z  INFO [clojure-lsp.nrepl:18] - ====== LSP nrepl server started on port 53951
2022-07-08T04:53:53.477Z  DEBUG [lsp4clj.server:26] - trace - received request {:jsonrpc "2.0", :method "initialize", :params {:process-id nil, :root-path "/Users/me/code/opensource/clojure-lsp", :client-info {:name "emacs", :version "GNU Emacs 28.1 (build 1, x86_64-apple-darwin18.7.0, NS appkit-1671.60 Version 10.14.6 (Build 18G95))\n of 2022-05-11"}, :root-uri "file:///Users/me/code/opensource/clojure-lsp", :capabilities {:workspace {:execute-command {:dynamic-registration false}, :symbol {:symbol-kind {:value-set [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26]}}, :did-change-watched-files {:dynamic-registration true}, :apply-edit true, :file-operations {:did-create false, :will-create false, :did-rename true, :will-rename true, :did-delete false, :will-delete false}, :semantic-tokens {:refresh-support false}, :workspace-edit {:document-changes true, :resource-operations ["create" "rename" "delete"]}, :configuration true, :workspace-folders true}, :text-document {:definition {:link-support true}, :formatting {:dynamic-registration true}, :code-action {:dynamic-registration true, :is-preferred-support true, :code-action-literal-support {:code-action-kind {:value-set ["" "quickfix" "refactor" "refactor.extract" "refactor.inline" "refactor.rewrite" "source" "source.organizeImports"]}}, :resolve-support {:properties ["edit" "command"]}, :data-support true}, :document-symbol {:symbol-kind {:value-set [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26]}, :hierarchical-document-symbol-support true}, :publish-diagnostics {:related-information true, :tag-support {:value-set [1 2]}, :version-support true}, :type-definition {:link-support true}, :rename {:dynamic-registration true, :prepare-support true}, :folding-range {:dynamic-registration true}, :hover {:content-format ["markdown" "plaintext"]}, :document-link {:dynamic-registration true, :tooltip-support true}, :synchronization {:will-save true, :did-save true, :will-save-wait-until true}, :semantic-tokens {:dynamic-registration true, :requests {:range true, :full true}, :token-modifiers ["declaration" "definition" "implementation" "readonly" "static" "deprecated" "abstract" "async" "modification" "documentation" "defaultLibrary"], :token-types ["comment" "keyword" "string" "number" "regexp" "operator" "namespace" "type" "struct" "class" "interface" "enum" "typeParameter" "function" "method" "member" "property" "event" "macro" "variable" "parameter" "label" "enumConstant" "enumMember" "dependent" "concept"], :formats ["relative"]}, :linked-editing-range {:dynamic-registration true}, :range-formatting {:dynamic-registration true}, :declaration {:link-support true}, :signature-help {:signature-information {:parameter-information {:label-offset-support true}}}, :implementation {:link-support true}, :completion {:completion-item {:snippet-support true, :documentation-format ["markdown" "plaintext"], :resolve-additional-text-edits-support true, :insert-replace-support true, :deprecated-support true, :resolve-support {:properties ["documentation" "detail" "additionalTextEdits" "command"]}, :insert-text-mode-support {:value-set [1 2]}}, :context-support true}, :call-hierarchy {:dynamic-registration false}}, :window {:work-done-progress true, :show-message nil, :show-document {:support true}}, :experimental {:test-tree true}}, :initialization-options {:dependency-scheme "jar", :show-docs-arity-on-same-line? true}, :work-done-token "1"}, :id 27717}
2022-07-08T04:53:53.491Z  DEBUG [lsp4clj.server:33] - received unexpected request initialize {:process-id nil, :root-path "/Users/me/code/opensource/clojure-lsp", :client-info {:name "emacs", :version "GNU Emacs 28.1 (build 1, x86_64-apple-darwin18.7.0, NS appkit-1671.60 Version 10.14.6 (Build 18G95))\n of 2022-05-11"}, :root-uri "file:///Users/me/code/opensource/clojure-lsp", :capabilities {:workspace {:execute-command {:dynamic-registration false}, :symbol {:symbol-kind {:value-set [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26]}}, :did-change-watched-files {:dynamic-registration true}, :apply-edit true, :file-operations {:did-create false, :will-create false, :did-rename true, :will-rename true, :did-delete false, :will-delete false}, :semantic-tokens {:refresh-support false}, :workspace-edit {:document-changes true, :resource-operations ["create" "rename" "delete"]}, :configuration true, :workspace-folders true}, :text-document {:definition {:link-support true}, :formatting {:dynamic-registration true}, :code-action {:dynamic-registration true, :is-preferred-support true, :code-action-literal-support {:code-action-kind {:value-set ["" "quickfix" "refactor" "refactor.extract" "refactor.inline" "refactor.rewrite" "source" "source.organizeImports"]}}, :resolve-support {:properties ["edit" "command"]}, :data-support true}, :document-symbol {:symbol-kind {:value-set [1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26]}, :hierarchical-document-symbol-support true}, :publish-diagnostics {:related-information true, :tag-support {:value-set [1 2]}, :version-support true}, :type-definition {:link-support true}, :rename {:dynamic-registration true, :prepare-support true}, :folding-range {:dynamic-registration true}, :hover {:content-format ["markdown" "plaintext"]}, :document-link {:dynamic-registration true, :tooltip-support true}, :synchronization {:will-save true, :did-save true, :will-save-wait-until true}, :semantic-tokens {:dynamic-registration true, :requests {:range true, :full true}, :token-modifiers ["declaration" "definition" "implementation" "readonly" "static" "deprecated" "abstract" "async" "modification" "documentation" "defaultLibrary"], :token-types ["comment" "keyword" "string" "number" "regexp" "operator" "namespace" "type" "struct" "class" "interface" "enum" "typeParameter" "function" "method" "member" "property" "event" "macro" "variable" "parameter" "label" "enumConstant" "enumMember" "dependent" "concept"], :formats ["relative"]}, :linked-editing-range {:dynamic-registration true}, :range-formatting {:dynamic-registration true}, :declaration {:link-support true}, :signature-help {:signature-information {:parameter-information {:label-offset-support true}}}, :implementation {:link-support true}, :completion {:completion-item {:snippet-support true, :documentation-format ["markdown" "plaintext"], :resolve-additional-text-edits-support true, :insert-replace-support true, :deprecated-support true, :resolve-support {:properties ["documentation" "detail" "additionalTextEdits" "command"]}, :insert-text-mode-support {:value-set [1 2]}}, :context-support true}, :call-hierarchy {:dynamic-registration false}}, :window {:work-done-progress true, :show-message nil, :show-document {:support true}}, :experimental {:test-tree true}}, :initialization-options {:dependency-scheme "jar", :show-docs-arity-on-same-line? true}, :work-done-token "1"}
2022-07-08T04:53:53.493Z  DEBUG [lsp4clj.server:24] - trace - sending response {:jsonrpc "2.0", :id 27717, :error {:code -32601, :message "Method not found", :data {:method "initialize"}}}

Awesome! Looking forward to see that working, will take a look soon, LMK how I can help with that if you need.

The parallel processing of messages with pipeline-blocking sounds a good idea indeed

@ericdallo this is going really well. make creates a working server. There's still lots to do.

LSP research on tracing. The clients and servers all support some kind of tracing, where you can watch what's put on the wire. They all seem to follow a similar format. What I want to know is whether this format is a standard, a convention, or is just implemented anew each time. The lsp4clj code logs traces, but I'd like to conform to a standard if there is one.
Design of tracing and logging. The lsp4clj server does a tiny amount of logging, mostly in error cases. And it does tracing if requested. In both cases, it uses the logger protocol. That means logs and traces both go to the same place—for clojure-lsp it's whatever file timbre is logging to. But I think you're supposed to send them to separate files. So, how should we separate them? I've thought of three possibilities. First, when clojure-lsp (or another server) initializes a lsp4clj server it could pass a trace file to write to. Second, clojure-lsp could pass a tracing function (and logging function), which would let clojure-lsp decide how to handle traces. Third, clojure-lsp could pass a trace channel, which would have traces (strings like "Received request 'workspace/executeCommand' id: 1 - ") put on it. Then clojure-lsp could decide what to do with the traces by reading them off the channel. This sounds a little more complicated, and gives more responsibility to clojure-lsp, but feels more similar to the rest of the design—where communication happens through channels. In theory you could do the same thing with regular logging, sending logs through a channel. If we did the second or third option, we could probably delete the logger protocol entirely, and just let servers implement logging however they want. That'd remove some complexity from the clojure-lsp side where we have a bunch of gnarly code that re-implements timbre as a logger protocol.
Improve error handling. lsp4clj should send clients one of the generic JSON-RPC errors in several cases, such as when JSON parsing fails or when receive-request throws. There are tools in lsp4clj.json-rpc.messages to generate these error messages, but the lsp4clj.server needs to use them.
Move clojure-lsp.coercer-v1 back to lsp4clj, replacing lsp4clj.coercer. I can't remember exactly why, but as I was removing the Java coercion, I moved the coercer to clojure-lsp. It should be moved back to lsp4clj. As an aside, someday I'd like to consider removing most of the output conformation. Obviously it was necessary when we needed to convert Clojure data to Java objects. But now it converts Clojure data to slightly different Clojure data. I find specs a clumsy way to do thatx. I'd rather use regular Clojure code. For example, rather than use s/conform to convert a keyword to an enum integer, I'd rather look up the enum integer in a hashmap in the regular non-spec code. Generally I think specs are an OK way to parse input, especially if it can come in many different forms, but they're a hard way to generate output.
Fix tests. The integration tests should be close to passing, but in fact they don't even run. I suspect there's a simple mistake somewhere, but I haven't researched. The unit tests will fail because I haven't adapted them to the new code yet. For example, the handler methods now all receive components as a second argument, but some of the tests don't pass that. And a few tests will fail because they incorrectly use camelCase keywords still.
Decide what to do with db/db. I'm no longer storing the `dbin the globaldb/dbatom. Instead, I'm threading a localdbatom through the rest of the code. I'd like to delete the(def db)eventually. Having global state like that makes it hard to do certain tings. That said, having a global var is useful in the nrepl and in tests. So I might want to bring that global state back in certain cases, perhaps by storingcomponentsincomponents` somewhere.
Remove lsp4j and other Java deps and class definitions. We're actually pretty close on this.
General code review. It'd be nice to start incorporating feedback now, if possible.
Feature testing. A lot of the development of this code was done by starting the server and trying different features. I need to do more of that. The thing I'm most worried about is that the clients, or at least lsp-mode, are pretty lenient about bad data. If you forget to convert a keyword to an enum int, it'll happily carry on, but show a weird icon, or sort things incorrectly, or something like that. I tried to catch those cases as I was re-implementing the coercion, but I could have missed some.
Spec review. Another worry is that usually (SomeJavaClass. (:foo %)) turned into a hashmap, in JSON pseudocde {"foo": (:foo %)}. As I was pulling out all the Java code, I mostly assumed that was true, without checking every case. But sometimes it actually turned into {"parent": {"foo": (:foo %)}}. I may have missed some of those. I need to cross-check our output with the spec. This is going to be painful.

I can make progress on all of this stuff, but if you see something you'd like to work on, let me know.

I've done 4, 5, 6 and best of all ... 7! lsp4j and all of its deps are removed. 🤘

Next I think I'll introduce tests for lsp4clj. I've been testing in the repl, but it's time to formalize things. As I do that I can improve the error handling (3).

@ericdallo I don't think we need lsp4clj-server and lsp4clj-protocols to be separate anymore. IIRC you made them separate to avoid having all the lsp4j dependencies pulled into the API and CLI. But that won't be a problem anymore. Would it be OK if I moved the protocols into lsp4clj-server?

I've been having a problem with the new code that wasn't happening in lsp4j.

Here's what I think is going on in lsp4j. (Java code is so hard for me to read.)

First we createLauncher. This instantiates several objects:

A RemoteEndpoint. This is the thing that will processes messages. In ring terms it's the handler.
A MessageConsumer. This is a list of things that will all 'consume' messages. Each one does something with a message, then passes the message to the next consumer. Technically, the RemoteEndpoint is also a MessageConsumer. It's the first in the list. The last one writes JSON-RPC to the wire. Optionally, in between, you can have tracers or other consumers. In ring terms, this is a stack of middleware that handles a request and transforms the response in several ways.
A StreamMessageProducer. This is the thing that will read JSON-RPC off the wire, convert it to Java obects, and call the MessageConsumer list. In ring terms, it's middleware that transforms the request before calling the handler.
A ConcurrentMessageProcessor. This is the thing that will connect the producer to the consumer list.

Then we ask the launcher to startListening. This starts a thread in which the ConcurrentMessageProcessor will:

Ask the producer to start a loop that reads messages.
Pipe those messages to the consumer stack.

Despite using newCachedThreadPool (and despite the name "ConcurrentMessageProcessor") only one thread is created to read messages.

So, one background thread reads messages from the client, and forwards them to the main thread. In the main thread, lsp4j receives requests and notifications from the client, processes them (by forwarding them to clojure-lsp) and replies. This is one source of outbound messages.

On the clojure-lsp side, we start many threads and so it's possible for many of them to ask lsp4j to send messages to the client—for example, by calling publishDiagnostics—all at the same time. publishDiagnostics doesn't block waiting for the client to respond. But calling that method will result in an outbound message.

All outbound messages, whether responses to client requests or server-generated requests and notifications, all go to a single object. (There's some messiness with Service objects and reflection in here that I don't really understand, and so I don't know exactly which object it is.) In any case, as it's writing JSON output, that object synchronizes its writes across threads.

I'm not sure how big the buffer is between the server's output and the client's input, but if the client is slow to read messages, the buffer will fill up and the server won't be able to write.

So, effectively you can only write messages as fast as lsp4j can put them on the output stream. If the server-side writer can't keep up or the client-side reader is slow, publishDiagnostics will block waiting for other threads to finish writing.

In the new architecture, output is modeled a bit differently. There's a channel which gets messages put on it. A go loop is reading messages off the channel, one-by-one, serializing them to JSON and writing them to the output stream.

Now we're in a situation where that go block, like the synchronized lsp4j writer before it, can only write messages as fast as the client is willing to read them.

If clojure-lsp creates thousands of messages to put! on the chan, and the go block doesn't drain the put queue fast enough, you end up with the exception about having too many pending puts.

Unfortunately, I've already seen this exception in the wild. When metabase starts up, we publish thousands of diagnostics messages, one for each of the files in that very large project. The server-side writer is slow (possibly because the client-side reader is slow) and so we get the pending puts exception. I think we are just over the limit. The put queue can only contain 1024 messages, but metabase has 1084 Clojure files.

After some experimentation I have one way to fix tthis. In lsp4clj, in send-notification, I can use >!! instead of put! so that if the go block gets backed up, we block further notifications. (I tried (go (>! output msg)) but that causes the same pending puts exception, which the core.async wiki suggests will be the case.)

After changing send-notification to use >!!, spawn-async-tasks can no longer use go-loop (because you aren't supposed to use blocking operations inside go blocks for fear of depleting all the threads that process the go blocks). So, on the clojure-lsp side, spawn-async-tasks has to be switched to (thread (loop [])). Maybe that's fine. It dedicates 4 threads to the 4 async tasks, but that's not terrible.

This works, and I'd be happy with it as a long-term solution. It may even be correct... we are, after all, doing blocking io. Perhaps in a sense, we've essentially discovered a subtle bug. All along we've been doing blocking io inside a go block. We probably haven't ever depleted all the go threads, but it's been possible... and now, it's fixed!

But maybe there's a way where it can all be solved on the lsp4clj side, without changing spawn-async-tasks. I'll think about that more.

@mainej really sorry for the delay. I reviewed the code on lsp4clj, and intend to review the one on clojure-lsp side soon.

1.

AFAIK there is no such standard, but for some reason vscode and lsp follow a pretty close format, nvim/coc doesn't though, so I'd go with what is clear from our point of view.

2.

I saw you done 3 and looks the best option, really more complex indeed.

3.

Yeah, we never used that, I started using on rename and one other request I think, but we should try to use it more. No problem for now though

8.

I've done the code review on lsp4clj and intend to do on clojure-lsp soon. The code is really good overall IMO.

and 10.

I can help with that, hard to cover 100% though, that's why I'd like to use v1 a little bit before we release to users

This works, and I'd be happy with it as a long-term solution. It may even be correct... we are, after all, doing blocking io. Perhaps in a sense, we've essentially discovered a subtle bug. All along we've been doing blocking io inside a go block. We probably haven't ever depleted all the go threads, but it's been possible... and now, it's fixed!

That's so good to hear! Thanks for the explanation on lsp4j side, that really makes sense why it happens on huge projects, and I agree, the fix sounds like a good long-term solution, I'm a little bit curious about how much memory we are using with those 4 threads comparing with go-blocks but I suspect it will be irrelevant

clojure-lsp / lsp4clj