GrayJack / tree-sitter-janet

A tree-sitter grammar parser for Janet
BSD 3-Clause "New" or "Revised" License
16 stars 3 forks source link

Error, possibly related to short #3

Closed sogaiu closed 4 years ago

sogaiu commented 4 years ago

I found that:

https://github.com/PaulBatchelor/monolith/blob/f77662279bd8eb5c764d9b07ed9c3530593340f9/janet/jpm.janet#L190

gave an error.

I tried to shrink it down a bit:

~(any (* '(any (if-not (set `\/`) 1)) (+ (set `\/`) -1)))

That seems to still yield an error.

This is with 1cdd3770251ae4f5457d64d06f3a7b0a73db72a9

sogaiu commented 4 years ago

It may be related, but this also gave an error:

{:a '(some (if-not (set "=;") 1))
 :b '(set "; ")}
GrayJack commented 4 years ago

I think it's related to set special form, it does requires 2 expression, not one

sogaiu commented 4 years ago

Not sure about the first case, but in the second case, the form is quoted, so depending on how that value is used (e.g. as part of something that builds other code), it's not necessarily a problem.

In other words, strictly speaking, that's not a set special form but rather the symbol set being as part of a data structure.

At least...that's how it seems to me.

What do you think?

sogaiu commented 4 years ago

As a bit of supporting evidence, consider: https://github.com/bakpakin/x43bot/blob/master/x43bot.janet#L21

That code is written by bakpakin.

GrayJack commented 4 years ago

Yes, I agree that this should be the case, but how can we solve that on the tree-sitter parser?

sogaiu commented 4 years ago

BTW, I think these may be the last of the problems I've been able to find across the code I collected. You've done a great job of eliminating problems :+1:

sogaiu commented 4 years ago

I confess to not having a good solution.

Will think about it more.

GrayJack commented 4 years ago

One easy, but hacky way is to allow set to have one expression or 2 expr

sogaiu commented 4 years ago

Yes, that seems like it would cover the cases I've found.

Perhaps in practice that would suffice.

GrayJack commented 4 years ago

Fixed by 51971f64f00fa8772adb994543a57e95c80d6cbc

sogaiu commented 4 years ago

All .janet files I've collected here now parse with no errors :)

GrayJack commented 4 years ago

If you get more error, feel free to file a issue, and if you have the time as well, you can send a PR as well :3

sogaiu commented 4 years ago

The following is not related to any error, but related to the discussion of the use of the symbol set. On the janet-lang site, the page on syntax has the following:

(def grammar
  ~{:ws (set " \t\r\f\n\0\v")
    :readermac (set "';~,|")
    :symchars (+ (range "09" "AZ" "az" "\x80\xFF") (set "!$%&*+-./:<?=>@^_|"))
    :token (some :symchars)
    :hex (range "09" "af" "AF")
    :escape (* "\\" (+ (set "ntrzfev0\"\\")
                       (* "x" :hex :hex)
                       (error (constant "bad hex escape"))))
    :comment (* "#" (any (if-not (+ "\n" -1) 1)))
    :symbol :token
    :keyword (* ":" (any :symchars))
    :constant (+ "true" "false" "nil")
    :bytes (* "\"" (any (+ :escape (if-not "\"" 1))) "\"")
    :string :bytes
    :buffer (* "@" :bytes)
    :long-bytes {:delim (some "`")
                 :open (capture :delim :n)
                 :close (cmt (* (not (> -1 "`")) (-> :n) ':delim) ,=)
                 :main (drop (* :open (any (if-not :close 1)) :close))}
    :long-string :long-bytes
    :long-buffer (* "@" :long-bytes)
    :number (cmt (<- :token) ,scan-number)
    :raw-value (+ :comment :constant :number :keyword
                  :string :buffer :long-string :long-buffer
                  :parray :barray :ptuple :btuple :struct :dict :symbol)
    :value (* (any (+ :ws :readermac)) :raw-value (any :ws))
    :root (any :value)
    :root2 (any (* :value :value))
    :ptuple (* "(" :root (+ ")" (error "")))
    :btuple (* "[" :root (+ "]" (error "")))
    :struct (* "{" :root2 (+ "}" (error "")))
    :parray (* "@" :ptuple)
    :barray (* "@" :btuple)
    :dict (* "@" :struct)
    :main :root})

Note the use of the symbol set meaning something completely different from the special form...

How meta to encounter this in a grammar description of the language, presumably expressed using Janet itself!

As an additional occurence, I also found the following in boot.janet:

(def default-peg-grammar
  "The default grammar used for pegs. This grammar defines several common patterns
  that should make it easier to write more complex patterns."
  ~@{:d (range "09")
     :a (range "az" "AZ")
     :s (set " \t\r\n\0\f")
     :w (range "az" "AZ" "09")
     :S (if-not :s 1)
     :W (if-not :w 1)
     :A (if-not :a 1)
     :D (if-not :d 1)
     :d+ (some :d)
     :a+ (some :a)
     :s+ (some :s)
     :w+ (some :w)
     :d* (any :d)
     :a* (any :a)
     :w* (any :w)
     :s* (any :s)})
GrayJack commented 4 years ago

We can do this to solve the conflict

peg_set: $ => seq(
      '(',
      'set',
      choice($.str_literal, $.long_str_literal, $.quote, $.short_quote),
      ')',
),

It will not conflict with set cause set always have 2 parameters and PEG set will always have 1 and it will be string literal (or quotation inside macros)

sogaiu commented 4 years ago

Yes, that looks like it would solve the conflict in this case, so that seems good!

Here are a few related observations:

To expand on the second point, if some of the symbols in special forms do get "reused" for other purposes, some of those cases might be addressed in an analogous manner to your idea, but in other cases, it might not be so easy.

For example, imagine if instead of (set "';~,|"), this had been something like (set a ";~,|") in the same context, so it's being used not as a special form, but still as data. It doesn't seem possible to me to accurately identify which use it is using tree-sitter's current capabilities.

Perhaps in practice this won't be so likely and I am needlessly concerned.

At any rate, it's nice that the grammar can distinguish the existing uses :)

GrayJack commented 4 years ago

At least in the REPL, I was able to define a function named set, but when trying to call it, the REPL always interpreted as the special form, always requiring the 2 args

GrayJack commented 4 years ago

Also I notice all usages are either quoted or quasiquoted or in a bigger expression that ar quoted or quasiquoted, so I think who handles the expression is the function/macro that receives it, in this case, peg/compile and peg/match

sogaiu commented 4 years ago

In my experience, special form handling in various lisps has a higher priority than ordinary functions, so what you found makes sense to me (about set being seen by the repl as a special form).

sogaiu commented 4 years ago

It also makes sense that the usages of the set symbol as data would be quoted (quasi or otherwise) -- this seems consistent with the earlier point you made about Janet interpreting set as a special form at the repl.

sogaiu commented 4 years ago

I think it might be possible for set to be interpreted as something other than a special form in a non-quote context though -- inside a macro.

As a concrete example in another lisp, there was an instance I found when looking over collected Clojure code where defn and def appeared unquoted within a use of case as test-constants. Looking at the examples on the page linked for case there is:

(case 'y
      (x y z) "x, y, or z"
      "default")

What I came across looked something like:

(case expr
      ...
      (def defn def-) "defining form"
      ...
      :unrecognized)

In Clojure, def is a special form and case is a macro. In the example here, def appears unquoted but is not treated as a special form.

I mention this example because IIUC in lisps with macros (including Janet), the same kind of thing is possible.

GrayJack commented 4 years ago

Fair enough, maybe we should explore it more to see if we can do it in Janet, but for now I think the not so hacky solution is ok :smile:

sogaiu commented 4 years ago

Yes, exploring whether it's possible in Janet sounds good.

I agree that for the moment (and possibly for a long time to come), what you've come up with seems practical.

Regarding the name peg_set -- the user of the parse tree (probably a developer I imagine) will have to use that name and if at some point, set is used in a different non-special-form context that is not related to peg, I think that might be found to be confusing.

It seems like the syntax node names that will be visible are like a public API and I wonder how easy it will be to change them later. I suppose the first user of tree-sitter-janet is / will be language-janet and you maintain that one, so may be this is not currently an issue :) Neovim support seems to be on the way though and from what I can tell there is active work being done for Emacs.

On a related note, I struggled a lot with the naming of the syntax nodes in tree-sitter-clojure as well as which ones to expose. By comparsion, I liked the names you came up with in tree-sitter-janet (e.g. sqr_array and sqr_tuple) more than my attempt :+1: