EntilZha / PyFunctional

Python library for creating data pipelines with chain functional programming
http://pyfunctional.pedro.ai
MIT License
2.41k stars 132 forks source link

Empty sequence #159

Closed Kache closed 3 years ago

Kache commented 3 years ago

I know I can do seq([]), but is there a particular reason seq() isn't supported?

EntilZha commented 3 years ago

Is there a use case for this?

Kache commented 3 years ago

Primarily for consistency, and it seems to let users avoid special-casing. Same as usage as tuple() and set() for empty containers. (list() and dict() too, although [] and {} are generally preferred.) I happened to want an empty sequence and was surprised.

It seems strange to me to intentionally define a special case that will raise TypeError. Conceptually, I would expect the following to work:

pattern_of_lists = [
    [0, 1, 2],
    [0, 1],
    [0],
    [],
]
for arr in pattern_of_lists:
    for_conversion = seq(arr)
    for_expressing_literal = seq(*arr)  # why specifically TypeError for []?
    for_conversion == for_expressing_literal

Also (I'm sure you know):

seq([], [], []) == [[], [], []]
seq([], []) == [[], []]
seq([])  # surprise! But sure, this is special-cased for seq(*literal_elements) convenience
seq()  # also surprise! (raises)

Was there a particular edge case or other ambiguity that warrants the TypeError?

Could (should?) make empty seq a "singleton" if you wanted, since it's immutable.

Not common, but I've also used "empty container" as the identity object in a reduce/fold before.

stephan-rayner commented 3 years ago

Hello friends, I thought about this for a bit today and I had an idea.

I think the easiest way might be to change the constructor in Sequence by giving a default value to the sequence parameter like so.

class Sequence(object):

    def __init__(self, sequence=[], transform=None, engine=None, max_repr_items=None):
        ...

That said, I imagine pylint would be bothered by this.

I do have an alternative that I have used in production myself. For the code your example you provided it would look something like this

seq(pattern_of_lists) \
    .map(lambda arr: (seq(arr), seq(*arr) if arr else seq([])))

This results in a sequence of tuples (for_conversion, for_conversion_literal to use the original example's variables) and from there you could compare them in a map or run .for_all(lambda pair: pair[0] == pair[1]) depending on your use case.

The output from the above example ends up like this:

[([0, 1, 2], [0, 1, 2]), ([0, 1], [0, 1]), ([0], [0]), ([], [])]

I liked this when I used it because it made the substitution explicit. From the library maintainer perspective it doesn't require a change which may or may not (not the maintainer of this lib so I will defer there) induce a bug down the road.

I know that isn't what was asked for but I hope that helps someone :slightly_smiling_face:.

EntilZha commented 3 years ago

The idea to have a default doesn't necessarily mean an equivalent way of doing it is bad style-wise. The reason for not doing sequence=[] is because the array is then a singleton across all calls, which leads to very weird behavior. The right way to implement this pattern, is to set sequence=None and then have a if sequence is None check, if so then set sequence = []. In this particular case, I don't think you'd need to change the Sequence code since the type error is thrown in _parse_args and the Sequence constructor usually isn't called by users, rather by the library itself.

Back to the question of whether you should be able to call seq() with arguments, I can get behind the argument of mirroring set() and list() in returning an empty sequence. The example stephan gave also seems like a reasonable use case for this.

Given that, I'd welcome a PR that modifies the len(args) == 0 code section to instead initialize to an empty sequence. I think that is probably the only change needed, but its possible I'm missing something

Kache commented 3 years ago

Cool! I've created https://github.com/EntilZha/PyFunctional/pull/161

I've followed CONTRIBUTING.md except for the TravisCI part, as I'm a little unclear about what to do there.

Kache commented 3 years ago

merged!