daniellansun / groovy-parser

Yet another new parser for Groovy programming language(project code: Parrot)
Apache License 2.0
97 stars 11 forks source link

Discussion of Groovy 3 Changes by @gavingroovygrover #50

Open daniellansun opened 5 years ago

daniellansun commented 5 years ago

Discussion of Groovy 3 Changes

1. methodMissing/getProperty/et al

Wujek Srujek said there was something about methodMissing (or maybe it was getProperty et.al.?) not being able to be defined with a closure assignment to a metaClass. Jochen replied This is going to be a standard case with the new MOP, so this will ensured to work.

2. removing automatic list expansion

Jochen wrote regarding this:

A method call done with a list that finds no matching method for that list (a method with one parameter of type List, Collection, Object, etc), will cause a second method selection iteration. This time the list is "unpacked" and all elements of the list are taken as if the method call had been done with the elements rather than the list. Groovy also supports spreading of lists by an syntax element, making this automatic feature not needed. In fact this can be quite surprising for users and is a problem for performance. A spread version might be still not good in performance, but at least the user will have to use an extra symbol and thus have the visual indicator. As of why this feature was originally added is unclear. Looking at user code you will find barely intended usages of this. Thus it should be removed.

3. Removing default null argument

The default null argument is used for method calls that have one parameter, but the call is done without an argument for the parameter. Groovy will here use null, as long as the type of the parameter is not a primitive. In case of a primitive the call fails. This feature was mainly added in the early Groovy pre 1.0 years before Groovy supported default values for parameters. Another limitation of this logic is that it works only with single parameter methods. There have been multiple cases of confused users by this logic. Performance wise this logic doesn't cost much, and certainly not more than a call with a real argument. But since this is a confusing feature of limited use, it should be removed.

Wujek gave the example:

class Foo {
  Foo( whatever ){
    println "whatever is: $whatever" 
  }
}
new Foo() // prints 'whatever is: null'

4. real named parameters

Wujek requested scala/ceylon-style named parameters, not rolling them into a Map as first parameter: This can be achieved by changing:

def method(a, b, c) { ... }
//into
def method(@Param('a') a, @Param('b') b, @Param('c') c) { ... }

with @Param being a runtime-retained annotation.

We could also add default values:

def method(a=1+2+3) { ... }
//which ends up being:
def method(@Param('a', defVal=Param_a_value.class) a) { ... }

where the expression gets compiled into a class, that gets saved, and executed upon execution.

Jochen replied: Surely overriding methods would get more difficult and the user experience from Java would get lower. Currently if you do

def foo(int a=1, long b=2) 
//we create the methods 
foo(int,long) 
foo(int) 
foo() 

there is a strict right to left rule for this and you cannot have foo(long) this way. We do it like this to avoid type problems since a method signatrue can appear only once. With optional arguments like I think you described there would be only one method. So the optional nature is not available from Java directly, so less interop. Also if you then write in a subclass for example foo(long), does it override foo(int a=1, long b=2) in some cases or not? That's why more difficult... in this more difficult for the compiler and the human.

Wujek: I think that calling Java methods from Groovy just has to use positional parameters (I don't think I can succesfully use the Map workaround that exists now? the java method would need to take a Map, and I don't know if it would work anyways, just like POJO setProperty doesn't work). What about overriding? When a method is overridden, you generate its own @Params and value classes, not overridden methods use the old ones. The problem might arise when you override and change the param names, and don't know the runtime type (have a reference to the superclass but actually a subclass is used), which is one of the edge cases I mentioned (but scala / ceylon do manage to get it working) - there could be a restriction that you can't change the names or sth, but this seems stupid.

Jochen: Ok, I didn't know that you compile so many methods for all the possible combinations. All the better, imho. I was just talking about the case where in groovy you have:

def method(Map map) { ... }
//and you precompile it and try to call it from Java:
method(1, 2)

or whatever. An explicit map would work, though.

But interoperability means in Groovy we have to include the Java world. That means we have to additionally look at the usage from Java as well as what happens if we subclass in Java and use from Groovy. and there is such a subclassing problem with only named arguments as well here... Assume:

class X { 
   def foo(int a, int b){} 
} 
.... 
def x = getInstanceFromSomewhere() 
x.foo(b=1,a=2) 

and assume the getInstanceFromSomewhere() does not return an X, but an Y extends X, written in Java, thus without named parameters. That means this call x.foo above would work only if the returned Object is an X, as soon as it is an Y it will not work anymore.

Maybe it is worth looking what others are doing in this regard? They did make it work, but then again, maybe Java interop is not as great as in Groovy. I also think that I mixed and matched Map arguments with default arguments, and you treat those cases separately (which is correct ;d). My main pain is the catch-it-all Map.

How about something like this:

@MapParameters("a","b","c") 
def foo(params) { 
   println "$a+$b = $c" 
} 

the transform could add a block of code at the beginning of the method which will extract values for a,b,c from the map and declare local variables with these names, for you to use later on. Then you have an automated check. Such a transform should not be difficult for methods.

5. Instance based Categories

Instead of having to provide a class with static methods it would be good to be able to feed an instance to the use-construct and that then we will use this instance along with its instance methods. This allows instance state to be used.

Wujek had asked for a category that he instantiates himself:

use(new MyCategory('with arguments')) {
    //
}

The added methods are instance methods (not static) that take self as the first argument. My use case: we had a DSL that could evaluate 'paths' against objects via Unified EL (pimped a lot and extended). The type of the object doesn't matter as long as there are correct resolvers (that's the standard way in EL, which is a standard part of Java EE). We had a Java engine that takes a path, the root, and returns value(s). I wanted to have it so I can take the instance that I get into the script (I don't know the type, neither do I need to, as far as I know it's just 'something'), and call a method on it, which uses the engine inside. Normally, in Java, you use the engine like this: engine.evaluate(root, 'some.path') In groovy I wanted:

root.'some.path'

where I use methodMissing for the 'some.path' part. I did it with messing around with the metaClass, but that adds the methods for ever (for the current groovy), whereas after the evaluation (like further in the process) I would like the method to disappear and not leave trace. As far as I know, there is no clear way to remove a method from a metaClass, you can only set it to something else (like sth that throws an exception), but that's not clean, imho. BTW, maybe this could be in the new MOP as well? The engine gets passed to the groovy process as a script binding, as it is a pretty expensive thing to create. I want / need to use our DI framework (CDI) for that. It all works beautifully except for the metaClass part. To sum up: I would like to be able to apply a category (maybe something new?) to a block of code, and I would like to be able to initialize it somehow myself with dependencies, not rely on static methods / state, and that I can use to add methodMissing to classes.

My use case was that the category would get some arguments that I had dependency-injected into the class that had the above code. It would have made my design simpler in a few cases. The change should minimally reach the block. The code that is inside (which might be a call to GroovyScriptEngine recursively, as the scripts might call evaluate() and others) or deeply nested methods should see the changes. Mind that such use blocks might be called by multiple threads, so static state is not really an option (unless you mess around with ThreadLocals...).

Jochen: Categories currently are ThreadLocal changes, visible along the stack. This has been a problem in the past, because of the way we checked our callsites for validity. But now I know how to prevent this. So the only thing that is going to be slower is the method selection part and entering the block. Imho it should not really be a problem to use an instance here.

6. simpler case statement

Bob Brown wanted a version of Ruby's case statement:

match (s) { 
  when CUSTOMER, "#00FF00" 
  when SYSTEM,   "#FF0000" 
  when ADMIN,    "#0000FF" 
  otherwise      "#000000"
} 
//instead of
switch(s){
  case CUSTOMER: return "#00FF00"; break; 
  case SYSTEM: return "#FF0000"; break; 
  case ADMIN: return "#0000FF"; break; 
  default: return "#000000"; break;
} 

Peter Niederweiser said: Adding a totally different switch syntax because it may read a bit nicer (but departs from Java) is not a good way to evolve a language. Jochen mentioned: I normally use maps for this kind of logic. ... With command expressions the above seems to be possible. Tim Yates put together a quick meta-programming version of the Ruby Match thing. Paul Wilson gave the map-based version:

[ CUSTOMER: '#00FF00', 
  SYSTEM:   '#FF0000', 
  ADMIN:    '#0000FF'].get(s)

7. source line and file available

Bob Brown also wanted:

println "An error occurred on ${__LINE__} of ${__FILE__}" 

Jochen said it should not be too difficult to implement for normal cases. The problem is what we give here for stuff that has no file, like scripts from strings and runtime created scripts. They can still have a line, but a file would be then... what?

8. source tracing during execution

Bob Brown also wanted a facility analogous to bash's xtrace option.

bash -x command.sh 

Jochen: Really difficult is (for compiled code). With an pure interpreter it is easy, it can just print out the text it is going to interpret. But in Groovy we work on the AST and compile bytecode from that. What a transform produces may not have anything to do with something you can understand in Groovy. What we would have to do in the end is to store somehow the "source" in the bytecode, not only multiplying the size, but also having a useless information for most cases. If we now say it is enough to have this feature for not precompiled code, then a transform could do it actually.

9. Changing Safe Navigation to stop evaluation

Currently an expression like a?.b.c will fail if a is null. It will not evaluate b, but it will try to evaluate c on null. This defies the intend of safe navigation to avoid a NullPointerException. Thus this should be changed to stop the evaluation of the path expression.

Bob Brown suggested the syntax:

println a??.b.c.d.e.f 
//instead of
println a?.b?.c?.e?.f 
//with compiler generating
def tmp = a 
tmp = tmp ? tmp.a : null 
tmp = tmp ? tmp.b : null 
tmp = tmp ? tmp.c : null 
tmp = tmp ? tmp.d : null 
tmp = tmp ? tmp.e : null 
if (tmp) tmp =tmp.f 

Jochen said it was a syntax change and not covered by GEP-11.

10. replacing groovy closures

Gavin Grover (me) asked if Groovy 3 will deprecate most of the core GDK methods in favor of the new Java 8 lambda methods? e.g. forEach, filter, into, map, flatMap, pipeline, sorted, groupByMulti, reduce, anyMatch.

Jochen said Codehaus may support both and then migrate in later versions if that is possible. Cedric added that Lambdas in Java 8 and Closures in Groovy are two very different beasts. Lambdas in Java 8 must be assigned to interfaces and cannot be "manipulated" like the closures from Groovy. We must support both or we'll loose very useful features. We could align the method names, use a similar API but in the end, we will still support both.

11. generators

Wujek wanted Python-style generators (the 'yield' keyword), also in C#.

def gen():
  i = 1
  while True:
    yield i
    i += 1

for i in gen(): print i # that loops forever

Wujek: I can't stress how useful that is to create streaming, read only data sources in just a few lines. Of course, you can do it in Java, with Iterators, Groovy alleviates the need to create an Iterable for it (as DGM adds iterator() to Iterator), and I can do it with closures coercion but it still is verbose, and I have to implement the 'remove()' method, which I don't care about - we are read only. Would love it to be automatic.

Russel Winder mentioned: Information for people not au fait with Python: The above is a generator not a function, so the result of a call to it (generators are callable just as a function is) is an iterable. For in Python executes next on an iterable until a StopIteration exception is raised. The point here is that generators create iterables which provide streams of values, i.e. lazy evaluated infinite sequences. This is very cool.

12. list comprehensions

Wujek also wanted: list comprehensions ('squares = x****2 for x in range(1, 11)') which create lists in memory, and generator expressions ('(x**y for x in xrange(1, 11) for y in xrange(1, 11))') that create generators (point #1). The xrange function retuns a kind of a generator itself, so shouldbe a really memory-efficient implementation of the multiplication table ;d (Python 3 doesn't have both functions, range behaves like xrange in python 2.x.) One can attach conditions to filter our certain elements, like:

[x**2 for x in range(10) if x % 2](x__2-for-x-in-range(10)-if-x-%-2)
//similarly for generators:
for i in (x*y for x in xrange(1, 4) for y in xrange(1, 4) if x % 2 if y % 2): print i

(But this slowly gets not very readable when not formatted correctly.)

Russel: Moreover you can have dictionary (aka map) comprehensions and set comprehensions. Furthermore the comprehensions are generally faster than using construction by for/while statement since all the activity happens in the PVM level rather than via Python objects. Python 3 has gone to great lengths to ensure that at all stages of functional programming with map, reduce, filter, etc. it is iterables that are being passed not actual data structures. cf. The Java project LazyList.

13. lazy streams

Wujek: In Groovy, methods like findAll, collect, flatten and others create lists in memory that have all elements and then returns it. There are numerous cases when you don't want that, you want to 'stream' the data, like in our case, we have an object, that has relationships to other objects, and there can be millions, and while we generate a report (with a Groovy DSL), we don't want all of them at once most of the time, as we output one by one (after filtering, processing and so on). If we do want all of them at once, we just have to wrap them in a list or sth. In Python, at some point, functions like map or enumerate have been changed to return generators, and if one wants to have all in memory and have random access via indexing, you do 'list(someMethodThatReturnsStreamingCollection())'. Guava has 'live views'. I think this last one would also benefit GPath, right?

Cedric: Well, at the last Groovy DevCon, we discussed that point and we (more or less) agreed that we should wait for what Java 8 APIs will look like, because basically, they follow the "lazy" idea. Meanwhile, the extension module mechanism would allow for example Tim Yates to propose lazy evaluation and generators as an experimental feature before we integrate it :)

Tim's generator example: http://timyates.github.com/groovy-stream/ Russel: TotallyLazy http://code.google.com/p/totallylazy/

14. review OO syntax

Christopher Taranto said: Coming from the Perl realm where Moose and Class::MOP have provided some excellent OO syntax - I would say please look at what Perl has been doing. Lots of lessons have been learned and are freely available. From the documentation:

my $point = Point->new(x => 1, y => 5);
$point->clear;

package Point;
use Moose;
has 'x' => (is => 'rw', isa => 'Int');
has 'y' => (is => 'rw', isa => 'Int');
sub clear {
   my $self = shift;
   $self->x(0);
   $self->y(0);
}

package Point3D;
use Moose;
extends 'Point';
has 'z' => (is => 'rw', isa => 'Int');
after 'clear' => sub {
   my $self = shift;
   $self->z(0);
};

15. each function

Daniel Sun suggested improving the each function: When each function returns true, the iteration will continue, and when returns false, the iteration will break.

[1, 2, 3](1,-2,-3).each{ 
  if( it == 2 ){ 
    return true; // means continue
  } 
  println it 
} 
//yields 
//1 
//3 

[1, 2, 3](1,-2,-3).each{ 
  if( it == 2 ){ 
    return false; // means break
  } 
  println it 
} 
//yields 
//1 

Guillaume said it would break existing code. 1, 2, 3.each { println it } would stop after the first iteration! Because println it is a statement, which (in a way) returns null which is evaluated to false.

Paul Holt said: The only way to break out of an 'each' right now is via throwables, because the inner loop is a Closure. Return just returns from the current run of the Closure. Can we look forward to seeing a different way to escape from a loop defined by a closure? Maybe a new keyword... Have any keywords been defined but not used?

Guillaume: Well, reusing break, continue, etc, are some obvious choices, of course, but if you look back in the archives of the mailing-lists, you'll find plenty of discussions on this topic, about local and non-local returns, breaking out of each loops, etc. The problem is not trivial, and we've never really found a good approach there.

Cedric mentioned: Also, there's the Closure.directive field that could be used by some algorithms, however, even the Groovy core doesn't really make use of it. With Closure.DONE.

16. tuples

Wujek: Could we have:

def (a, b, *rest) = [1, 2, 3, 4, 5](1,-2,-3,-4,-5)
// or maybe:
// def (a, b, rest)
println a // 1
println b // 2
println rest // [3, 4, 5](3,-4,-5)

Jochen: that was back then part of the possible extensions to the proposal, but nobody really seemed to be interested, so I didn't implement it.

Wujek: Also, it seems that groovy silently ignores assignments with too many or too few arguments:

def (a, b) = [1](1)
println b // null

//Jochen: what the code does is more or less this: 
def temp = [1](1) 
def a = temp[0](0) 
def b = temp[1](1) 

Jochen: here we do nothing special in Groovy. You will see the null if you get an element beyond the last one. Actually... good point for Groovy 3... should we keep that?

Guillaume: We introduced that feature in Groovy 1.6 I think (or was it 1.7???), and we had some long discussions on how it should behave. Again the mailing-list archives give some good argumentation on the pros and cons of the various options we had. For example, the problem with the "rest" approach was that the rest could contain 4, 5, but what does it mean, that the last element was a list containing 4 and 5, or does it mean that there were two extra elements which have been put inside a list of "rest" args? There's no way a developer could know one or the other. So we prefer avoiding that situation. Furthermore, Groovy's ranges and negative arguments are handy for getting the "rest" of a list.

Also, yes, the b variable containing null if there are more variables than elements in the list is also intentional. We tended to prefer that rather than throwing a runtime exception. Again, that's a choice.

Last but not least, let me mention that this multiple assignment features works nicely with anything that's got a getAt() method. So you could create your own class with a getAt() method, and have the automatic destructuring going on by doing this multiple assignment. A little example:

import groovy.transform.Canonical

@Canonical
class Coordinates {
  double latitude
  double longitude

  double getAt(int idx) {
    if (idx == 0) return latitude
    if (idx == 1) return longitude
    throw new RuntimeException("Only two coordinates")
  }
}

def home = new Coordinates(45.123, 2.123)
println home

def (latitude, longitude) = home
assert latitude == 45.123
assert longitude == 2.123

Jochen: you are partially wrong here. It works without exception because it is a list, not because of something special the assignment does. If we would take your example and do

def (latitude, longitude, localGravity) = home 

then we would get the exception just as expected, since there are only 2 coordinates

Wujek:

def (a, *b) = [1, 2, 3](1,-2,-3)
println a // 1
println b // [2, 3](2,-3)
assert b.packed

def (a, *b)= [1, [2, 3](1,-[2,-3)]
println a // 1
println b // [2, 3](2,-3)
assert ! b.packed

The idea is: the packing can set an arbitrary, groovy attribute of the metaClass for the instance that users can query. It is only available for a variable declaration with the *.

Actually, Python 3 does it so:

>>> a, *b = [1, 2, 3](1,-2,-3)
>>> a
1
>>> b
[2, 3](2,-3)
>>> a, *b = [1, [2, 3](1,-[2,-3)]
>>> a
1
>>> b
[[2, 3]([2,-3)]

Where the *b is always put in a wrapping list, so as for the code to be consistent. I am not sure which of the options is better. I do write some python code (including 3k) and have not yet had problems with the semantics, it kind of works just as expected.

Jochen: I think this make the most sense... the problem I have is more like this...

assert x.getClass() == X 
def (a, *b) = x 
println b.getClass() 

what is this program supposed to print? According to the above it would make sense to have b then to be a list type always.