clj-commons / rewrite-clj

Rewrite Clojure code and edn
https://cljdoc.org/d/rewrite-clj
MIT License
592 stars 55 forks source link

Allow parsing of technically invalid code #87

Open ericdallo opened 3 years ago

ericdallo commented 3 years ago

Yeah, it sounds weird, let me give some context:

On clojure-lsp, we have a completion code feature, where the user types something and we try to complete the code, so for example if user writes: foo/b we suggest to complete with foo/bar if that function exists. To that work, we parse the whole file (following the LSP spec) and search for a node at the given row/col, the issue here is that we can't parse the code if it's wrong(not a valid clojure code), so if user tries to complete the following code:

(def abc 123)
foo/

that it'll not work since (z/of-string "(def abc 123)\nfoo/") will trown a reader exception because foo/ is not a valid clojure code.

If somehow would be possible to parse that, it'd would be possible to parse that node and get a token node of foo/ then extract the alias and know what functions are available :)

LMK if you need more info on anything

sogaiu commented 3 years ago

FWIW, there is some discussion of trying to work with "incomplete" forms starting here: https://clojurians-log.clojureverse.org/rewrite-clj/2019-11-12/1573599410.186700

The discussion ends on this day: https://clojurians-log.clojureverse.org/rewrite-clj/2019-11-14

The approach might be described roughly as "automatic closing of delimiters".

Where that ended up was this code: https://github.com/sogaiu/rewrite-cljs-playground/commit/7074cfc222aa65c93020bddeed83c0236e4f86b8

ericdallo commented 3 years ago

Thank you for the response @sogaiu, yeah, that'd be really useful, does that branch already works? How could I test it?

sogaiu commented 3 years ago

@ericdallo It was a while back, but my current recollection is that it did work in limited testing.

There are only 5 lines to change in a single file, so perhaps it's possible to just make the changes manually on your end to try it?

ericdallo commented 3 years ago

Great! I'll try to change it and release it local, thank you!

ericdallo commented 3 years ago

I managed to release a local rewrite-clj with that change, but I probably doing something wrong, I could not use binding to change the parse-anyway value and even if declaring it with a true value and releasing again, the parser throws an exception anyway

sogaiu commented 3 years ago

I don't know how readable the following will be, but here is a section of code I was using at one point that I think used the feature in question: https://gist.github.com/sogaiu/b6286feeaefc326a449af109f4a4b77d#file-script-clj-L217-L245

What you described:

use binding to change the parse-anyway value

seems to be basically what I was doing, so I'm not sure my code will be of much use.

I suppose it's also possible that the branch I was using: https://github.com/sogaiu/rewrite-clj/commits/parse-anyway has a few extra bits in it that might be affecting things, but nothing obvious stands out at the moment.

ericdallo commented 3 years ago

Not sure I am doing anything wrong here, both give me the same exception:

(binding [rpc/*parse-anyway* false]
  (z/of-string "(def a 123) asd/ (def b 345)"))
clojure.lang.ExceptionInfo: Invalid symbol: asd/.
{:type :reader-exception, :ex-kind :reader-error}
 at clojure.tools.reader.impl.errors$throw_ex.invokeStatic (errors.clj:34)
    clojure.tools.reader.impl.errors$throw_ex.doInvoke (errors.clj:24)
    clojure.lang.RestFn.invoke (RestFn.java:442)
    ...

and

(binding [rpc/*parse-anyway* true]
  (z/of-string "(def a 123) asd/ (def b 345)"))
clojure.lang.ExceptionInfo: Invalid symbol: asd/.
{:type :reader-exception, :ex-kind :reader-error}
 at clojure.tools.reader.impl.errors$throw_ex.invokeStatic (errors.clj:34)
    clojure.tools.reader.impl.errors$throw_ex.doInvoke (errors.clj:24)
    clojure.lang.RestFn.invoke (RestFn.java:442)
    ...
snoe commented 3 years ago

So, for completion, we don't actually need to parse, if we simply tokenized with something like indexingpushbackreader, I believe we could avoid a lot of this.

borkdude commented 3 years ago

@snoe Do you mean avoid using rewrite-clj at all and tokenize using a normal Clojure reader?

ericdallo commented 3 years ago

How could we do that @snoe ?

borkdude commented 3 years ago

I have an example using edamame for self-repairing code if the code has insufficient closing delimiters:

https://github.com/borkdude/edamame/blob/ffef4b2085c1fab4b55b36061d4844a1da762b81/test/edamame/core_test.cljc#L121-L130

sogaiu commented 3 years ago

@ericdallo Sorry, I think I didn't quite understand the original post.

The discussions from before are specifically about handling the case where delimitiers are missing. The related code doesn't handle any other cases.

Very sorry to have taken other folks' time on this :(

ericdallo commented 3 years ago

@borkdude edamame seems to not work since it works for missing closing delimiters, I tested with something like (def a) foo/ (def b) and I could not get any info that could help fix that :/

@snoe, the issue with using indexing-push-back-reader manually is that we don't have a simple token, we have the whole document with the line and pos so we would need to "walk" through the safe-parsed text and check line and column, not sure how to do that

@sogaiu, no worries, this issue is to make rewrite-clj parse a wrong code indeed, that'd be the best approach if available IMO, the discussions from above are just other ideas to fix clojure-lsp completion issue, but if we have that on rewrite-clj, it'd be super useful

borkdude commented 3 years ago

@ericdallo You can continue parsing the stream after an exception. So you would just skip over foo/.

ericdallo commented 3 years ago

I managed to make a non proud workaround:

(defn ^:private safe-zloc-of-string [text]
  (try
    (z/of-string text)
    (catch clojure.lang.ExceptionInfo e
      (if-let [[_ token] (->> e
                              Throwable->map
                              :cause
                              (re-matches #"Invalid symbol: (.*\/)."))]
        (-> (string/replace-first text token (str token "_"))
            z/of-string
            (z/edit->
              (z/find-next-value z/next (symbol (str token "_")))
              (z/replace (n/token-node (symbol token)))))
        (throw e)))))

But I just realized that it's not enough to handle completion, since we depend on clj-kondo analysis and when the code is not parseable, clj-kondo return no analysis for that file 😔 @borkdude Is there any way to lint a code and ignore non-parseable tokens?

Anyway, I think it worths leaving this issue open to a better fix on rewrite-clj side @sogaiu, thanks all for the help

borkdude commented 3 years ago

@ericdallo Feel free to post a minimal repro on the clj-kondo issue tracker and I will see what can be done.

ericdallo commented 3 years ago

Done @borkdude https://github.com/clj-kondo/clj-kondo/issues/1146 Also, I created an issue on clojure-lsp side to track the whole fix: https://github.com/clojure-lsp/clojure-lsp/issues/270

ericdallo commented 3 years ago

@sogaiu is there any way to fix that on rewrite-clj? This workaround is not totally reliable :/

sogaiu commented 3 years ago

@ericdallo Sorry, I haven't yet thought of anything.

May be it's worth consulting @lread about this?

borkdude commented 3 years ago

What I did in clj-kondo: I made a tweak to the parser where it parses a token. In case of an exception, it logs this exception to an atom in a dynamic var (this was done so I didn't have to change a lot of code) and just continues parsing as if the token wasn't there at all.

ericdallo commented 3 years ago

Yeah, that's similar to the workaround I did, but in this case we need the token to completion and etc. Since is possible to have a node of unparseable code (like I did in my workaround), IMO I think that rewrite-clj should handle that somehow and allow parsing that.

lread commented 3 years ago

Heya @ericdallo! I am trying to stayed focused on rewrite-clj v1 priorities at this time.

I am getting an idea for what you'd like, but details always help. Add anything you can think of.

I'll lurk/follow this discussion for now.

ericdallo commented 3 years ago

Thanks @lread, I'm quite excited about that v1 :)

I think the issue description and my workaround are detailed, LMK if you need any extra info.

vemv commented 2 years ago

Reading the original post, it seems plausible to me that an alternative solution could be conceived, that would be just as useful to clojure-lsp, and more convenient to rewrite-clj (because nothing would have to be changed!)

For completing a/ in file.clj:

This has the aditional advantage of being resilient to multiple, unsaved errors.

For instance for the saved file file.clj:

(ns file)

and the non-saved current buffer for it:

(ns foo)

]] ;; an error

a/ ;; the completion

]];; another error!

...a/ completion will still work, even in presence of the other faulty bits.

lread commented 5 months ago

So, if I understand, we have different technical challenges that might be treated separately and perhaps incrementally:

  1. invalid stand-alone tokens, examples:
    1. foo/
    2. :keyword:
    3. "foo\xbar" - invalid escaped char in string, (already parses today and throws on sexpr)
  2. invalid syntax, examples:
    1. #::[a] - namespaced map without namespace
    2. ^private - meta without target
    3. {:a 1 :b} - unbalanced maps (already parses today, and throws on sexpr)
    4. {:a 1 :a 2} - map with duplicate keys (already parses today, does NOT throw on sexpr)
    5. #{1 1} - set with duplicate value (already parses today, does NOT throw on sexpr)
  3. unclosed sequences, examples:
    1. (hey jude dont
    2. (make [it bad)
    3. {:take [a #{sad (song]

I've been looking at parsing code for other work. I am tempted to move 3 (handling unclosed sequences) to a separate issue.

I think we might be able to handle 1 and 2 by loosening any parsing exceptions rewrite-clj imposes and making sexpr lazy where it is not.

The throw that happens on parse would instead occur on sexpr. I'll experiment sometime soon to see if this strategy is actually practical and then come back to work out any details like:

  1. Granularity of enabling/disabling relaxed parsing.
  2. Consider providing users a mechanism to query a node for rewrite-clj's impression of validity
  3. Consider optionally enabling throwing on sexpr of #{1 1} and {:a 1 :a 2} (we don't today)
borkdude commented 5 months ago

Somewhat related issue in edamame: https://github.com/borkdude/edamame/issues/106

lread commented 5 months ago

Thanks @borkdude!

And it is so nice to see @sogaiu, even if it is just :eyes: on a comment!