Closed sogaiu closed 4 years ago
It may be related, but this also gave an error:
{:a '(some (if-not (set "=;") 1))
:b '(set "; ")}
I think it's related to set
special form, it does requires 2 expression, not one
Not sure about the first case, but in the second case, the form is quoted, so depending on how that value is used (e.g. as part of something that builds other code), it's not necessarily a problem.
In other words, strictly speaking, that's not a set
special form but rather the symbol set
being as part of a data structure.
At least...that's how it seems to me.
What do you think?
As a bit of supporting evidence, consider: https://github.com/bakpakin/x43bot/blob/master/x43bot.janet#L21
That code is written by bakpakin.
Yes, I agree that this should be the case, but how can we solve that on the tree-sitter parser?
BTW, I think these may be the last of the problems I've been able to find across the code I collected. You've done a great job of eliminating problems :+1:
I confess to not having a good solution.
Will think about it more.
One easy, but hacky way is to allow set to have one expression or 2 expr
Yes, that seems like it would cover the cases I've found.
Perhaps in practice that would suffice.
Fixed by 51971f64f00fa8772adb994543a57e95c80d6cbc
All .janet
files I've collected here now parse with no errors :)
If you get more error, feel free to file a issue, and if you have the time as well, you can send a PR as well :3
The following is not related to any error, but related to the discussion of the use of the symbol set
. On the janet-lang site, the page on syntax has the following:
(def grammar
~{:ws (set " \t\r\f\n\0\v")
:readermac (set "';~,|")
:symchars (+ (range "09" "AZ" "az" "\x80\xFF") (set "!$%&*+-./:<?=>@^_|"))
:token (some :symchars)
:hex (range "09" "af" "AF")
:escape (* "\\" (+ (set "ntrzfev0\"\\")
(* "x" :hex :hex)
(error (constant "bad hex escape"))))
:comment (* "#" (any (if-not (+ "\n" -1) 1)))
:symbol :token
:keyword (* ":" (any :symchars))
:constant (+ "true" "false" "nil")
:bytes (* "\"" (any (+ :escape (if-not "\"" 1))) "\"")
:string :bytes
:buffer (* "@" :bytes)
:long-bytes {:delim (some "`")
:open (capture :delim :n)
:close (cmt (* (not (> -1 "`")) (-> :n) ':delim) ,=)
:main (drop (* :open (any (if-not :close 1)) :close))}
:long-string :long-bytes
:long-buffer (* "@" :long-bytes)
:number (cmt (<- :token) ,scan-number)
:raw-value (+ :comment :constant :number :keyword
:string :buffer :long-string :long-buffer
:parray :barray :ptuple :btuple :struct :dict :symbol)
:value (* (any (+ :ws :readermac)) :raw-value (any :ws))
:root (any :value)
:root2 (any (* :value :value))
:ptuple (* "(" :root (+ ")" (error "")))
:btuple (* "[" :root (+ "]" (error "")))
:struct (* "{" :root2 (+ "}" (error "")))
:parray (* "@" :ptuple)
:barray (* "@" :btuple)
:dict (* "@" :struct)
:main :root})
Note the use of the symbol set
meaning something completely different from the special form...
How meta to encounter this in a grammar description of the language, presumably expressed using Janet itself!
As an additional occurence, I also found the following in boot.janet
:
(def default-peg-grammar
"The default grammar used for pegs. This grammar defines several common patterns
that should make it easier to write more complex patterns."
~@{:d (range "09")
:a (range "az" "AZ")
:s (set " \t\r\n\0\f")
:w (range "az" "AZ" "09")
:S (if-not :s 1)
:W (if-not :w 1)
:A (if-not :a 1)
:D (if-not :d 1)
:d+ (some :d)
:a+ (some :a)
:s+ (some :s)
:w+ (some :w)
:d* (any :d)
:a* (any :a)
:w* (any :w)
:s* (any :s)})
We can do this to solve the conflict
peg_set: $ => seq(
'(',
'set',
choice($.str_literal, $.long_str_literal, $.quote, $.short_quote),
')',
),
It will not conflict with set cause set always have 2 parameters and PEG set will always have 1 and it will be string literal (or quotation inside macros)
Yes, that looks like it would solve the conflict in this case, so that seems good!
Here are a few related observations:
It's not clear to me that the 1-argument version of set
will always be used for peg, so I wonder a bit about the naming.
Possibly more relevant though is that as more Janet code is written, people will end up reusing the names of special forms in a similar manner to the cases in this issue.
To expand on the second point, if some of the symbols in special forms do get "reused" for other purposes, some of those cases might be addressed in an analogous manner to your idea, but in other cases, it might not be so easy.
For example, imagine if instead of (set "';~,|")
, this had been something like (set a ";~,|")
in the same context, so it's being used not as a special form, but still as data. It doesn't seem possible to me to accurately identify which use it is using tree-sitter's current capabilities.
Perhaps in practice this won't be so likely and I am needlessly concerned.
At any rate, it's nice that the grammar can distinguish the existing uses :)
At least in the REPL, I was able to define a function named set
, but when trying to call it, the REPL always interpreted as the special form, always requiring the 2 args
Also I notice all usages are either quoted or quasiquoted or in a bigger expression that ar quoted or quasiquoted, so I think who handles the expression is the function/macro that receives it, in this case, peg/compile
and peg/match
In my experience, special form handling in various lisps has a higher priority than ordinary functions, so what you found makes sense to me (about set
being seen by the repl as a special form).
It also makes sense that the usages of the set
symbol as data would be quoted (quasi or otherwise) -- this seems consistent with the earlier point you made about Janet interpreting set
as a special form at the repl.
I think it might be possible for set
to be interpreted as something other than a special form in a non-quote context though -- inside a macro.
As a concrete example in another lisp, there was an instance I found when looking over collected Clojure code where defn
and def
appeared unquoted within a use of case as test-constants. Looking at the examples on the page linked for case
there is:
(case 'y
(x y z) "x, y, or z"
"default")
What I came across looked something like:
(case expr
...
(def defn def-) "defining form"
...
:unrecognized)
In Clojure, def
is a special form and case
is a macro. In the example here, def
appears unquoted but is not treated as a special form.
I mention this example because IIUC in lisps with macros (including Janet), the same kind of thing is possible.
Fair enough, maybe we should explore it more to see if we can do it in Janet, but for now I think the not so hacky solution is ok :smile:
Yes, exploring whether it's possible in Janet sounds good.
I agree that for the moment (and possibly for a long time to come), what you've come up with seems practical.
Regarding the name peg_set
-- the user of the parse tree (probably a developer I imagine) will have to use that name and if at some point, set
is used in a different non-special-form context that is not related to peg, I think that might be found to be confusing.
It seems like the syntax node names that will be visible are like a public API and I wonder how easy it will be to change them later. I suppose the first user of tree-sitter-janet is / will be language-janet and you maintain that one, so may be this is not currently an issue :) Neovim support seems to be on the way though and from what I can tell there is active work being done for Emacs.
On a related note, I struggled a lot with the naming of the syntax nodes in tree-sitter-clojure as well as which ones to expose. By comparsion, I liked the names you came up with in tree-sitter-janet (e.g. sqr_array
and sqr_tuple
) more than my attempt :+1:
I found that:
https://github.com/PaulBatchelor/monolith/blob/f77662279bd8eb5c764d9b07ed9c3530593340f9/janet/jpm.janet#L190
gave an error.
I tried to shrink it down a bit:
That seems to still yield an error.
This is with 1cdd3770251ae4f5457d64d06f3a7b0a73db72a9