JulienPalard / Pipe

A Python library to use infix notation in Python
MIT License
1.95k stars 113 forks source link

`groupby` Update and Use Case #65

Closed Servinjesus1 closed 2 years ago

Servinjesus1 commented 2 years ago

groupby seems to still produce an itertools._grouper object, which appears to be a type in process to deprecate by 2.0.

I also wonder how the keyfunc parameter works and if it's scoped for pipe end users. I tried to pass something to it, and I got a multiple values error. Does it, by chance, allow for type recasting of the object returned? If not, a feature like that would be wonderful so as to quickly generate outputs of pipe operations (maybe pipes have something like that already?)

Finally, in the documentation x%2 and "Even" will produce unexpected results :)

JulienPalard commented 2 years ago

groupby seems to still produce an itertools._grouper object, which appears to be a type in process to deprecate by 2.0.

Ohh. The wording was unfortunate: nothing in pipe does return a Pipe object, they return iterables. And a grouper object is iterable, so it's acceptable. I clarified the wording in the README, thanks for noticing.

I also wonder how the keyfunc parameter works and if it's scoped for pipe end users.

keyfunc takes an argument and should return the group for the given argument:

>>> from pipe import *
>>> from collections import namedtuple
>>> Student = namedtuple("Student", "name, mark")
>>> students = [Student("Ada", 10), Student("Alan", 12), Student("Mark", 17), Student("Bool", 7)]
>>> refused, accepted = students | groupby(lambda student: student.mark > 10) | map(lambda group: list(group[1]))
>>> refused
[Student(name='Ada', mark=10), Student(name='Bool', mark=7)]
>>> accepted
[Student(name='Alan', mark=12), Student(name='Mark', mark=17)]

I tried to pass something to it, and I got a multiple values error

Can you provide a reproducer?

Does it, by chance, allow for type recasting of the object returned?

If I understand the question correcly you can do this with | map, like in my previous example, else please provide an example.

Finally, in the documentation x%2 and "Even" will produce unexpected results :)

Haha! Yes, looks like 12 years ago my english was not that good :D thanks for noticing! I fixed this, and used a more readable ternary syntax while I was at it.

Also keep in mind the "pipe operators" shipped in pipe.py may be considered as "common examples", but feel free to implement your own as needed, they are fairly easy to write:

@Pipe
def where(iterable, predicate):
    return (x for x in iterable if (predicate(x)))

the goal of the lib is to make it easy to write your own "pipe functions", not to ship an exhaustive set of "pipe functions".

Servinjesus1 commented 2 years ago

keyfunc is the second argument in the definition of groupby:

@Pipe
def groupby(iterable, keyfunc):
    return itertools.groupby(sorted(iterable, key=keyfunc), keyfunc)

In your example, you pass one argument: a lambda function, which is the value of the first argument, iterable, right?

Here's a reproducer for the multiple values error:

from pipe import *
', '.join((1, 2, 3, 4, 5)
    | groupby(lambda x: "Odd" if x % 2 else "Even", keyfunc=None)
    | map(lambda x: f"{x[0]} : {', '.join(x[1] | map(str))}")
)

Take the keyfunc= away, and you'll get an error saying groupby() was passed 3 parameters instead of 2. So it seems the @Pipe decorator is turning the single lambda function argument into two.

JulienPalard commented 2 years ago

I can't reproduce the "multiple values" error with your given reproducer.

I stored it in a file as:

from pipe import *

print(
    ", ".join(
        (1, 2, 3, 4, 5)
        | groupby(lambda x: "Odd" if x % 2 else "Even")
        | map(lambda x: f"{x[0]} : {', '.join(x[1] | map(str))}")
    )
)

and ran:

$ python3.6 repro.py 
Even : 2, 4, Odd : 1, 3, 5
$ python3.7 repro.py 
Even : 2, 4, Odd : 1, 3, 5
$ python3.8 repro.py 
Even : 2, 4, Odd : 1, 3, 5
$ python3.9 repro.py 
Even : 2, 4, Odd : 1, 3, 5
$ python3.10 repro.py 
Even : 2, 4, Odd : 1, 3, 5
$ python3.11 repro.py 
Even : 2, 4, Odd : 1, 3, 5

Can you tell me more on how to reproduce it? Also a full traceback may help.

In your example, you pass one argument: a lambda function, which is the value of the first argument, iterable, right?

No, @Pipe work a bit like functools.partial (check Pipe.call), when you call a Pipe function it returns a "partial-like" which in turn will be "called" by ror, prefixing ror'other. The resulting call looks like:

fct(other_from__ror__, *args, **kwargs)

with fct being the @Pipe decorated function (like groupby), args and kwargs being the argument given when calling it.

So in my example I pass a lambda function, received in args[0], and the decorated function will be called later from ror as groupby(other, the_lambda), other being the left-hand-side part of the |. (This looks complex but this is how every @Pipe function works, it's not specific to groupby, it's the only role of @Pipe).

Servinjesus1 commented 2 years ago

Wait, so the lambda is the key function? It's used in itertools.groupby and in sorted, whose keyfunc I thought had to be something that told sorted how some elements should be sorted relative to one another (like a before b, b before c, whatever).

The keyfunc=none in my example is important, I'm trying to explicitly give an input to the keyfunc argument, as what I assume would be the second argument of the groupby from inspection. But it seems the pipe makes the first argument (the lambda) the second and passes the iterable prior to the pipe to the first? If that's true, I think I get it now. Just surprised that the keyfunc works for sorted as well as for the itertools.groupby.

JulienPalard commented 2 years ago

Wait, so the lambda is the key function?

Yes!

It's used in itertools.groupby and in sorted, whose keyfunc I thought had to be something that told sorted how some elements should be sorted relative to one another (like a before b, b before c, whatever).

Yes, the same key function are used to both sort then group. Read like like "sort by group before groupping".

But it seems the pipe makes the first argument (the lambda) the second and passes the iterable prior to the pipe to the first?

Exactly \o/

If that's true, I think I get it now. Just surprised that the keyfunc works for sorted as well as for the itertools.groupby.

Python use keyfuncs in other places too like max and min. As I said a few line before, sorting by the same keyfunc is a "sort by groups", so for example feeding:

A A B B A A

won't give three groups, one of two A's, one of two B's and one of two A's, but the sort will first reorganise as:

A A A A B B

then the itertools.groupby will give only two groups, one of 4 A's and one of two B's.

It's probably a bold move to stuff a sorted inside a groupby, it distances the Pipe semantic from the expected itertools semantic, which is probably not that good, but I wrote this like 10+ years ago, and I don't think changing it now is any better.

Anyway, if you really need a groupby which does not uses sort, feel free to implement it yourself, it should be as simple as:

@Pipe
def groupby(iterable, keyfunc):
    return itertools.groupby(iterable, keyfunc)

(nothing enforces using only pipes declared on pipe.py).

Servinjesus1 commented 2 years ago

Wonderful, thank you for walking me through this. I know this was only a very random example of what Pipe can do, but it's the first time I've seen something like this, so I appreciate your patience.