IntersectMBO / plutus

The Plutus language implementation and tools
Apache License 2.0
1.57k stars 479 forks source link

Do we have keywords? #463

Closed michaelpj closed 5 years ago

michaelpj commented 5 years ago

Are syntax identifiers like lam, error etc. keywords?

kwxm commented 5 years ago

Well the spec says that there's no concrete syntax, which would tend to imply that there aren't any keywords either!

The grammar's so rigid that there's surely no danger of conflicts if we do allow re-use of keywords, but looking at the parser/lexer source it looks as it might require quite a lot of rewriting to do that, since it does have to treat 'keywords' specially when they appear in certain positions, but when you tell the lexer that they are keywords, it'll complain if it finds them elsewhere.

Another issue here is that now that we have extensible builtins we've kind of lost control of the grammar: some external person might unwittingly implement a new builtin called wrap for example, and run into problems. However, do we really expect anyone else to be using the concrete syntax?

michaelpj commented 5 years ago

Well, in this instance it bit me because I had a generated AST with error as a variable name, which I then pretty printed and attempted to feed into plc typecheck. That didn't go so well. However, I have been trying to preserve the property that my generated ASTs have valid names so I can do this sort of thing, so if we want to keep these as keywords I'll just have to update my name-mangling function to handle them.

vmchale commented 5 years ago

Modifying the lexer/parser would get us away from standard methods - this is pretty much exactly what you do in other languages. But it would be possible if we really want compliance with the spec, I guess - it seems to me a little weird.

kwxm commented 5 years ago

Easy solution: keywords like (con, (lam, (wrap,... !

Is that the sound of distant screaming I hear?

BekaValentine commented 5 years ago

I think the easiest solution is to just make con, lam, etc. into keywords. This also has the additional benefit of eliminating potential confusion. If you can have a variable named lam or something to that effect, then it might be a security vulnerability, b/c at least in passing, these two things look awfully similar:

(lam x x) and [lam x x]

I'm sure that some clever person could exploit that to their advantage in writing malicious obfuscated code.

I propose we make all the relevant things into keywords and any case where the generated ASTs would try to use those names, we just append some unique number or an underscore or some junk like this.

vmchale commented 5 years ago

Can I close this? Or do we want to note this somewhere in the spec?

michaelpj commented 5 years ago

If we have keywords then IMO that should be in the spec. I don't buy that there is "no concrete syntax" - the figure says "lexical grammar of Plutus Core" which sure sounds like it's telling us how to tokenize it!

IMO builtin names should also not be keywords, but that would be a parser change. We know that we're getting a builtin name from context, so we can just parse an arbitrary identifier, try to map it to a builtin, and fail if it's missing.

kwxm commented 5 years ago

I'm just in the process of trying to get the spec sorted out, so I'll deal with this shortly.

kwxm commented 5 years ago

I've now modified the spec to say that con, lam, etc. are all keywords (it's not merged yet though). I agree that names of builtins shouldn't be keywords, but that shouldn't be a problem now that we always have to prefix them with the builtin keyword everywhere. Also, we'll have to modify the parser anyway when we get the new infrastructure for extensible builtins. I'm going to close this issue but I'll add a note to the one for extensible builtins.