Closed michaelpj closed 5 years ago
Well the spec says that there's no concrete syntax, which would tend to imply that there aren't any keywords either!
The grammar's so rigid that there's surely no danger of conflicts if we do allow re-use of keywords, but looking at the parser/lexer source it looks as it might require quite a lot of rewriting to do that, since it does have to treat 'keywords' specially when they appear in certain positions, but when you tell the lexer that they are keywords, it'll complain if it finds them elsewhere.
Another issue here is that now that we have extensible builtins we've kind of lost control of the grammar: some external person might unwittingly implement a new builtin called wrap
for example, and run into problems. However, do we really expect anyone else to be using the concrete syntax?
Well, in this instance it bit me because I had a generated AST with error
as a variable name, which I then pretty printed and attempted to feed into plc typecheck
. That didn't go so well. However, I have been trying to preserve the property that my generated ASTs have valid names so I can do this sort of thing, so if we want to keep these as keywords I'll just have to update my name-mangling function to handle them.
Modifying the lexer/parser would get us away from standard methods - this is pretty much exactly what you do in other languages. But it would be possible if we really want compliance with the spec, I guess - it seems to me a little weird.
Easy solution: keywords like (con
, (lam
, (wrap
,... !
Is that the sound of distant screaming I hear?
I think the easiest solution is to just make con
, lam
, etc. into keywords. This also has the additional benefit of eliminating potential confusion. If you can have a variable named lam or something to that effect, then it might be a security vulnerability, b/c at least in passing, these two things look awfully similar:
(lam x x)
and [lam x x]
I'm sure that some clever person could exploit that to their advantage in writing malicious obfuscated code.
I propose we make all the relevant things into keywords and any case where the generated ASTs would try to use those names, we just append some unique number or an underscore or some junk like this.
Can I close this? Or do we want to note this somewhere in the spec?
If we have keywords then IMO that should be in the spec. I don't buy that there is "no concrete syntax" - the figure says "lexical grammar of Plutus Core" which sure sounds like it's telling us how to tokenize it!
IMO builtin names should also not be keywords, but that would be a parser change. We know that we're getting a builtin name from context, so we can just parse an arbitrary identifier, try to map it to a builtin, and fail if it's missing.
I'm just in the process of trying to get the spec sorted out, so I'll deal with this shortly.
I've now modified the spec to say that con
, lam
, etc. are all keywords (it's not merged yet though). I agree that names of builtins shouldn't be keywords, but that shouldn't be a problem now that we always have to prefix them with the builtin
keyword everywhere. Also, we'll have to modify the parser anyway when we get the new infrastructure for extensible builtins. I'm going to close this issue but I'll add a note to the one for extensible builtins.
Are syntax identifiers like
lam
,error
etc. keywords?echo "(program 1.0.0 (lam error (forall a (type) a)) error" | plc typecheck --stdin
Unexpected 'error' at 1:21