Track-state functionality grinds CIDER to a halt on a project with many namespaces and heavy :refer :all usage

clojure-emacs / cider-nrepl

A collection of nREPL middleware to enhance Clojure editors with common functionality like definition lookup, code completion, etc.

https://docs.cider.mx/cider-nrepl

670 stars 175 forks source link

Track-state functionality grinds CIDER to a halt on a project with many namespaces and heavy :refer :all usage #806

Open alexander-yakushev opened 4 years ago

alexander-yakushev commented 4 years ago

We are using CIDER with a really big project (1000+ namespaces). Most of those namespaces have a common "prelude" form that requires and refers many other namespaces. With a project like this, CIDER consistently slows down the application and the Emacs. We were able to pinpoint the problem to track-state middleware, particularly the part that sends the namespaces that were changed since the last time. When the cache is empty, it sends all namespaces from cider-nrepl to cider. This actually creates three problems:

Clojure side (on cider-nrepl) spins like crazy and spends a lot of time in cider.nrepl.middleware.track-state/filter-core-and-get-meta function.
The huge result (~50Mb) gets transferred to the client. This takes ages when doing any remote development.
Emacs caches the received result and keeps it in memory all the time which makes Emacs GC last a few seconds. Most Emacs operations become ridiculously slow.

We were able to reduce this problem a bit by blacklisting :doc and :arglists from the data that gets transferred from cider-nrepl to CIDER. Apparently, it doesn't break anything, so it is not obvious why they are there in the first place.

In any case, it looks like this way of tracking changes (transferring huge maps on every change) doesn't scale too well. I could try to suggest a less strenuous way to do this, but I have to know where else the information from track-state is used. So far, I found only that font-locking relies on it.

bbatsov commented 4 years ago

Thanks for looking into this!

Emacs caches the received result and keeps it in memory all the time which makes Emacs GC last a few seconds. Most Emacs operations become ridiculously slow.

Btw, you can address this partially by setting a big GC threshold in Emacs. It's going to reduce the number of GC pauses, but they are going to be slower. Unfortunately Emacs's GC is quite primitive.

We were able to reduce this problem a bit by blacklisting :doc and :arglists from the data that gets transferred from cider-nrepl to CIDER. Apparently, it doesn't break anything, so it is not obvious why they are there in the first place.

I don't think they are needed. Probably just an oversight.

In any case, it looks like this way of tracking changes (transferring huge maps on every change) doesn't scale too well. I could try to suggest a less strenuous way to do this, but I have to know where else the information from track-state is used. So far, I found only that font-locking relies on it.

The font-locking is this data's top user for sure. If I recall correctly we use this for the dynamic indentation as well. One other usage is tracking the REPL type after evaluating code, although this can easily be extracted into a different middleware (or we can just try to do it client-side). One relatively simple workaround would be to put some limit on the data size or to make this middleware something optional that the clients have to initialize manually. We just have to handle the REPL type independently then.

bbatsov commented 4 years ago

@Malabarba Any thoughts on this one?

alexander-yakushev commented 4 years ago

I've actually made some progress locally (through rebinding vars), I will run it like that for a while longer and then will try to produce a patch for the upstream.

One thing that I immediately see is that the metadata of a Var is repeated in every possible namespace that interns that Var. We use interning heavily which is likely the main reason why things blow so out of hand.

Droping :arglists and :doc from the relevant meta helps on the Emacs side a little (less objects for the GC to track), but the slow processing time on the cider-nrepl side remains. Some of it I have fixed, so expect a PR soon.

bbatsov commented 4 years ago

Droping :arglists and :doc from the relevant meta helps on the Emacs side a little (less objects for the GC to track), but the slow processing time on the cider-nrepl side remains. Some of it I have fixed, so expect a PR soon.

Btw, here we should probably just whitelist the relevant meta so we can reduce the payload to the minimal possible size. Generally we can change the format however we want, as CIDER is the only editor using this currently.

bbatsov commented 4 years ago

Probably you've seen this already, but all the relevant usage of the metadata is here https://github.com/clojure-emacs/cider/blob/0c99b0718e1825d020115e0da736ddbcecabb910/cider-mode.el#L757 If we just return the type of the vars instead in the response things like arglists won't be needed.

Malabarba commented 4 years ago

@Malabarba Any thoughts on this one?

There are probably no magic potions here. :( It's probably a good thing to reduce the payload size as best we can, but in the end it will always halt on a project that's large enough. I'd recommend we add a simple validation to the middleware that disables it when the number of namespaces is very large. (Unless someone wants to work on a more robust solution of tracking only the namespaces that have recently been visited by the user, but that'll require a bit more effort and some additional communication between the two sides).

vemv commented 10 months ago

Issue moved to cider-nrepl

vemv commented 8 months ago

@alexander-yakushev curious, do you / other people still have this problem?

Would it be possible to have an example of the problematic ns pattern?

(Just the basics would suffice - I could programatically create 1000 like those, for instance)

alexander-yakushev commented 8 months ago

I currently don't have access to the project where this has triggered. But I can give you the overall shape. So, there are around ~20 "common" namespaces, each having say ~20-30 functions. Then, there is ~1000 "consumer" namespaces, each doing (require :refer :all) of every common namespace. Important detail that all those 1000 namespaces are eagerly loaded into the REPL.

I don't quite remember if the slowdown was all the time or just on fresh REPL initialization. I think it was the latter. Once the results for track-state were cached, the Clojure side wasn't lagging anymore. However, Emacs side did because of the huge cache and GC issues.

alexander-yakushev commented 8 months ago

What I have is the hack that I used to workaround this issue:

;; Patch track-state middleware to reduce the amount of data transferred.

(let [meta-ns (try-req 'cider.nrepl.middleware.util.meta)
      misc-ns (some-> meta-ns (.lookupAlias 'misc))
      track-ns (try-req 'cider.nrepl.middleware.track-state)
      rmk-var (some-> meta-ns (ns-resolve 'relevant-meta-keys))]
  (when (and meta-ns misc-ns track-ns rmk-var)
    (let [relevant-meta-keys (HashSet. (remove #{:doc :arglists :fn} @rmk-var))
          relevant-meta-keys-with-fn (HashSet. (remove #{:doc :arglists} @rmk-var))
          vmwf (ns-resolve track-ns 'var-meta-with-fn)
          relevant-meta (fn [m, ^HashSet rmk]
                          (let [m (reduce-kv (fn [acc k v]
                                               (if (and v (.contains rmk k))
                                                 (assoc acc k
                                                        (cond (true? v) "true"
                                                              (number? v) (str v)
                                                              :else (pr-str v)))
                                                 acc))
                                             {} m)]
                            (when (> (count m) 0)
                              m)))
          clojure-core (try-req 'clojure.core)
          v (ns-resolve track-ns 'filter-core-and-get-meta)]

      ;; Remove :doc and :arglists for everyone, including clojure.core.
      (.bindRoot (ns-resolve track-ns 'clojure-core-map)
                 {:aliases {}
                  :interns (reduce-kv (fn [acc k v]
                                        (if (var? v)
                                          (assoc acc k
                                                 (relevant-meta (vmwf v) relevant-meta-keys-with-fn))
                                          acc))
                                      {} (ns-map clojure-core))})

      ;; For others, remove :fn too.
      (.bindRoot v (fn [refers]
                     (persistent!
                      (reduce-kv (fn [acc sym the-var]
                                   (if-let [meta (when (var? the-var)
                                                   (let [the-meta (meta the-var)
                                                         the-ns (.ns ^Var the-var)]
                                                     (when-not (identical? the-ns clojure-core)
                                                       (relevant-meta the-meta relevant-meta-keys))))]
                                     (assoc! acc sym meta)
                                     acc))
                                 (transient {}) refers)))))))

I'm a bit hazy to say outright what this does, but if you ask me about the particular part I can probably try to remember.

vemv commented 8 months ago

Thanks!

Given that you don't face this problem first-hand anymore, would you mind if we simply close it?

tbh, it sounds a lot to me like a "don't do that" kind of problem.

:refer-all in particular is frowned upon, and linted by default by clj-kondo.

Having 1000 namespaces also is very much non-modular. It's increasingly accepted as better to have some sort of Polylith-like pattern.

That's not to say that by closing it we won't ever work on this, but it seems a sensible prioritization measure. For all we know, the project in question may have been abandoned or have no CIDER users at the moment.

Surely if it wasn't the case this thread would have get bumped from time to time.

alexander-yakushev commented 8 months ago

I understand the part about prioritization, that's fine. I don't mind this being closed. Having said that:

tbh, it sounds a lot to me like a "don't do that" kind of problem. Having 1000 namespaces also is very much non-modular.

It is quite important to do it in that project. Those 1000 namespaces are indeed split into modules, almost Polylith style, but during testing, it is necessary to have all modules loaded at the same time.

It is one of those projects that absolutely depend on being written in Clojure/other Lisp. Like, absolute most of the software can be written in anything else and Clojure is just a nice bonus there, but here it is vital. But that is also the reason why the common guidelines does not apply to it.

For all we know, the project in question may have been abandoned or have no CIDER users at the moment

Oh no, it is still going strong and growing, and uses CIDER exclusively. Probably has 1.5k namespaces now. It's just me who don't have the access right now because of the armed forces and stuff. Also, the reason why nobody comes complaining is probably because the workaround still works as intended.

So, sometime in the future I'd love to fix this. I don't like when tooling puts limits on things where the foundation does not. Like, Clojure itself is fine having those thousands of namespaces loaded. But for now this can be closed.

vemv commented 8 months ago

Thanks for the insights!

But that is also the reason why the common guidelines does not apply to it.

I'm not sure I understand this part - why is :refer so important? Most likely there are tools that will help you refactor them them to :as (clj-refactor can do it, but one ns at a time; clojure-lsp may offer this as part of its CLI)

Very likely, the patch could be simplified to removing :doc/etc from orchard.meta/var-meta-whitelist. Note that that namespace is not mranderson-ized nowadays, so it's easy to patch.

I cannot guarantee that cider will properly work when :doc/etc are removed though. Maybe stuff will work slower, with one nrepl request per usage.

Like, Clojure itself is fine having those thousands of namespaces loaded.

My understanding is that part of the problem has to do with :refer :all - the more refers, the more stuff we transmit over the wire. That pattern tends to show problems sooner or later, so drawing a clear line about what's "sane" so to speak is more pragmatic than pointing out what the Clojure runtime technically supports.

(e.g. Clojure also probably supports 10KLOC namespaces, or 2KLOC defns, but little support can be expected from most tooling authors there).

alexander-yakushev commented 8 months ago

I'm not sure I understand this part - why is :refer so important?

In the same way why clojure.core is automatically referred into every namespace. When you use many common functions and use them a lot, the alias prefix becomes noise. The code in that project has a high density of invocations.

I cannot guarantee that cider will properly work when :doc/etc are removed though. Maybe stuff will work slower, with one nrepl request per usage.

Yeah, that's why I didn't want to submit it back then. Don't want to break anything when I don't know exhaustively who needs this middleware and what for.

That pattern tends to show problems sooner or later, so drawing a clear line about what's "sane" so to speak is more pragmatic than pointing out what the Clojure runtime technically supports.

I agree with you in the situation when pushing the line further towards the "insanity" involves some sort of a tradeoff. Then, sure. But if the current limitation is only caused by bloat/redundancy, then I don't see why not remove it even though the run-of-the-mill usecases will never see the impact of it.