beanshell / beanshell

Beanshell scripting language
Apache License 2.0
862 stars 182 forks source link

Beanshell generics or type parameter support #66

Open nickl- opened 6 years ago

nickl- commented 6 years ago

Since last year bsh was happy to consume <identifier> as tokens scattered all over the grammar. There are 7 separate comparisons, thus far we are missing a few, for the parser to negotiate this form. This is adding a lot more complexity to the grammar and already shows an impact on performance.

So I sat out to implement the diamond operator <> that was easy enough, then I saw wildcards <?> are rejected so we add another definition for this one. Better cater for parameter lists too <? | identifier (, identifier)*>, that does the trick but what about type bounds? <? [(extends | super) identifier] | identifier (, identifier)*> something like that but I'm starting to have my doubts. This is quickly becoming more trouble than it's worth.

Lets see type parameter types <List<String>> so we start with a < that's 1 then find another < so we have 2 open, then we get a > but we still need 1 more so keep going till we get another > and we're done. Then I started wondering, why exactly are we doing this anyway? My immediate goal is just to avoid parser exceptions and make some failing tests work. Will we actually ever implement type safety for beanshell? Will we? There most definitely are far more pressing matters to be concerned with right now that is for sure.

Lets take a step back... if we're not actually enforcing the rules then we don't really need to tokenise the parameters. That makes things much easier, to implement and the parser to process, we only need to look for < and > anywhere and disregard its contents. Less things to match of course means quicker through matching the things we need.

This is what I've got for now:

"<" ("?")? (["a"-"z","A"-"Z","0"-"9",","," ","_","&"])* ("<" (~[">"])* ">")? ">"

The match set [] seems less greedy than the not match one ~[], or maybe I just haven't hit the nail on the head yet. This manages to parse all the examples on the oracle help pages but I'm sure there are other cases I miss. Something to work on...

For the raw types...

            public class Box<T> {
                private T t;
                public void set(T t) {
                    this.t = t;
                }
                public T get();
                    return t;
                }
            }

I just type cast to java.lang.Object if class not found and class name length == 1. Isn't this what java does before inference anyway? Everything I've tried so far seems to work without a hitch, everything except varargs.

Not quite ready to push just yet, still have some uncommitted snippets which I'm still pondering over. just wanted to put this out there and hear your thoughts on the topic.

stefanofornari commented 6 years ago

Hi Nick it sounds a pragmatic approach, and I agree we do not really need to enforce the type. I ran into the grammar used by PMD which at the end seems to do the same. You probably already found it, but I understand they use the follwoing:

TOKEN : { < RUNSIGNEDSHIFT: ">>>" > { matchedToken.kind = GT; ((Token.GTToken)matchedToken).realKind = RUNSIGNEDSHIFT; input_stream.backup(2); matchedToken.image = ">"; } | < RSIGNEDSHIFT: ">>" > { matchedToken.kind = GT; ((Token.GTToken)matchedToken).realKind = RSIGNEDSHIFT; input_stream.backup(1); matchedToken.image = ">"; } | < GT: ">" > }

(see https://github.com/pmd/pmd/blob/master/pmd-java/etc/grammar/Java.jjt)

HTH Ste

Ste

On Sat, May 19, 2018 at 12:30 PM, Nick Lombard notifications@github.com wrote:

Since last year bsh was happy to consume as tokens scattered all over the grammar. There are 7 separate comparisons, thus far we are missing a few, for the parser to negotiate this form. This is adding a lot more complexity to the grammar and already shows an impact on performance.

So I sat out to implement the diamond operator <> that was easy enough, then I saw wildcards <?> are rejected so we add another definition for this one. Better cater for parameter lists too <? | identifier (, identifier)>, that does the trick but what about type bounds? <? [(extends | super) identifier] | identifier (, identifier)> something like that but I'm starting to have my doubts. This is quickly becoming more trouble than it's worth.

Lets see type parameter types <List> so we start with a < that's 1 then find another < so we have 2 open, then we get a > but we still need 1 more so keep going till we get another > and we're done. Then I started wondering, why exactly are we doing this anyway? My immediate goal is just to avoid parser exceptions and make some failing tests work. Will we actually ever implement type safety for beanshell? Will we? There most definitely are far more pressing matters to be concerned with right now that is for sure.

Lets take a step back... if we're not actually enforcing the rules then we don't really need to tokenise the parameters. That makes things much easier, to implement and the parser to process, we only need to look for < and > anywhere and disregard its contents. Less things to match of course means quicker through matching the things we need.

This is what I've got for now:

"<" ("?")? (["a"-"z","A"-"Z","0"-"9",","," ","_","&"]) ("<" (~[">"]) ">")? ">"

The match set [] seems less greedy than the not match one ~[], or maybe I just haven't hit the nail on the head yet. This manages to parse all the examples on the oracle help pages but I'm sure there are other cases I miss. Something to work on...

For the raw types...

        public class Box<T> {
            private T t;
            public void set(T t) {
                this.t = t;
            }
            public T get();
                return t;
            }
        }

I just type cast to java.lang.Object if class not found and class name length == 1. Isn't this what java does before inference anyway? Everything I've tried so far seems to work without a hitch, everything except varargs.

Not quite ready to push just yet, still have some uncommitted snippets which I'm still pondering over. just wanted to put this out there and hear your thoughts on the topic.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/beanshell/beanshell/issues/66, or mute the thread https://github.com/notifications/unsubscribe-auth/AACoghaafnohZ60d-TuoCSd8hVFwrnSOks5tz_QrgaJpZM4UFqVy .

nickl- commented 6 years ago

Hey @stefanofornari! Tx for the feedback! 👍

Yes I also reference pmd often and the EL lexer which covers less but has compatible syntax. I originally went their route... wait I don't think that is it... which is indeed similar to the new approach you are right, well spotted. Their generics tokeniser is called TypeParameters But this seems to be missing quite a bit, do you agree?

This won't even chew diamond tags <> if I'm not mistaken:

void TypeParameters():
{}
{
   "<" {checkForBadGenericsUsage();} TypeParameter() ( "," TypeParameter() )* ">"
}

void TypeParameter():
{Token t;}
{
   (TypeAnnotation())*
   t=<IDENTIFIER> {jjtThis.setImage(t.image);} [ TypeBound() ]
}

void TypeBound():
{}
{
   "extends" (TypeAnnotation())* ClassOrInterfaceType() ( "&" (TypeAnnotation())* ClassOrInterfaceType() )*
}

If you go and search through their grammar for TypeParameters (I think I called ours TypeArguments for some reason), you will see all the points of reference where the parser needs to work at this. Now compare that to the expression which does everything and anything oracle can imagine within diamond tags in the future as well.

Yes TypeArguments This is what we have currently (from last year):

https://github.com/beanshell/beanshell/blob/c4b098e7bbc0eb953eac02ba18dbeab7c5d0336c/src/main/jjtree/bsh.jjt#L1181-L1185

Just the basics which gets referenced 7 times already in our grammar and takes a significant punch. As you can see from their implementation it is going to get busy quick. That is very expensive stuff.

As far as priorities go implementing type safety is very low on my list. =) On the contrary I am in two minds whether to lax beanshell even more or not. It baffles me why a = 2; is perfectly acceptable variable declaration in bsh but if you want to decorate it you need to add a type. final a = 2; will give you the slightly frustrating annoying Found "final" final...is that really your final word(s) on my syntax. Sure it may take some getting used, not easy on the eyes much, but we already have everything implemented to get it done so why not?

Anyway back to topic... the point I was getting to, why waste resources on something we don't want. Moving on... Theodore S. Norvell suggests using MORE for this instead as it will produce more verbose feedback while he warns of dragons when finding problems with bad expressions, because they are silent. Which is one of the things that attracts me... not the dragons... it is quick! The first few times I ran with it I was confused why unit tests were passing, surely it chewed my whole script that fast. I had to go look, what are these tests actually validating.

// When a /* is seen in the DEFAULT state, skip it and switch to the IN_COMMENT state
MORE : {
"/*": IN_COMMENT }

// When any other character is seen in the IN_COMMENT state, skip it.
< IN_COMMENT > MORE : {
<  ~[] > }

// When a */ is seen in the IN_COMMENT state, skip it and switch back to the DEFAULT state
< IN_COMMENT > SKIP : {
"*/": DEFAULT }

From What is MORE?- The JavaCC FAQ - Theodore S. Norvell

For block comments, something spanning many lines this probably makes sense but it seems a lot of effort (for the parser) to swap states several times per line for the small generics blocks, doesn't that seem overkill? Another caveat with this approach is that "<" needs to be an IDENTIFIER but we already use it as such for something else.

nickl- commented 6 years ago

So after some more tweaking and deliberating... this is what we ended up with:

   "<" ( (~[">",";","|","&","\n","\r"])* ("& "|">,")? )* ([">"])+

Which properly addresses those nested parameters like

   List<List<Map<String,Map<List<String>,Integer>>>> ofNameMaps = new ArrayList<>();` 

but also avoids swallowing conditional expressions, without the need to keep a count of opening minus closing braces.

Spot anything I missed?

nickl- commented 6 years ago

See beanshell/beanshell@031a6350a40bd288c596e5019cc6679f88920c87 and beanshell/beanshell@2056edef12d5ac6c3c8a2838fc1655fde8e04087