EntilZha / PyFunctional

Python library for creating data pipelines with chain functional programming
http://pyfunctional.pedro.ai
MIT License
2.41k stars 132 forks source link

Add raw option to head() and the likes #173

Closed swiergot closed 2 years ago

swiergot commented 2 years ago

This solves #168 while maintaining backward compatibility.

As for __getitem__(), as you said there is no way to fix it without breaking backward compatibility. So we either break it or perhaps I can add something like get(idx) or get(idx, raw=False).

swiergot commented 2 years ago

I've made an attempt at adding get() but it gets complicated when it comes to an equivalent of array[a:].

I'm considering a different approach to this entire problem - a nowrap() function after which _wrap() has no effect, i.e. one last pipeline function can be used as it will not return a Sequence.

For example:

>>> a = seq([deque(), deque()]).first()
>>> type(a)
<class 'functional.pipeline.Sequence'>
>>> a = seq([deque(), deque()]).nowrap().first()
>>> type(a)
<class 'collections.deque'>
>>> a
deque([])

That would allow to bypass the current behaviour in all cases without breaking backward compatibility.

Let me know if you are interested in this solution.

EntilZha commented 2 years ago

Thanks for the PR! Overall, I like the solution, but a have a few improvements:

Nit: I prefer no_wrap since that's the terminology used elsewhere. I'd say wrap, but that's already taken by _wrap realistically.

A few related changes to making configuration more flexible:

  1. Adding a configuration to seq/pseq (https://github.com/EntilZha/PyFunctional/blob/8eef5d2cacac64e0504985f09a49cb0fe4489f77/functional/streams.py#L1), which means adding it/propagating the option in the Sequence constructor https://github.com/EntilZha/PyFunctional/blob/master/functional/pipeline.py#L35. Ideally, it should be something like __init__(self, ..., no_wrap=None). The constructor should set it like self.no_wrap = no_wrap so that the value is part of the public interface.
  2. Change the keyword args to no_wrap and their default values to None, representing that the option is not set, rather than explicitly set to True or False
  3. In places that check for raw (or after changes no_wrap), read the values in precedence order (first in list override the rest): (1) the keyword argument, (2) the class value from the constructor, and (3) a global default set to False. I'd probably do this by defining a function default like below and re-use it.
def default(*vals):
    for v in vals:
      if v is not None:
          return v
    raise ValueError(f"All values are unset in: {vals}")

One benefit of having it more configurable this way too, is that it resolves the pandas issue as well I think. If set at the class level, then the behavior in to_pandas/__getitem__ could be changed as well. Probably not necessary to do in this PR, can do a followup.

Thanks, super excited to merge this/resolve longstanding issue!

EntilZha commented 2 years ago

PR LGTM, just rerun black . one more time to solve the formatter error, and should be good to merge. Thanks again for the great PR!

codecov[bot] commented 2 years ago

Codecov Report

Base: 98.00% // Head: 98.07% // Increases project coverage by +0.07% :tada:

Coverage data is based on head (4f4e905) compared to base (8eef5d2). Patch coverage: 99.16% of modified lines in pull request are covered.

Additional details and impacted files ```diff @@ Coverage Diff @@ ## master #173 +/- ## ========================================== + Coverage 98.00% 98.07% +0.07% ========================================== Files 12 11 -1 Lines 2253 2337 +84 ========================================== + Hits 2208 2292 +84 Misses 45 45 ``` | [Impacted Files](https://codecov.io/gh/EntilZha/PyFunctional/pull/173?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Pedro+Rodriguez) | Coverage Δ | | |---|---|---| | [functional/pipeline.py](https://codecov.io/gh/EntilZha/PyFunctional/pull/173/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Pedro+Rodriguez#diff-ZnVuY3Rpb25hbC9waXBlbGluZS5weQ==) | `98.11% <95.00%> (+0.02%)` | :arrow_up: | | [functional/streams.py](https://codecov.io/gh/EntilZha/PyFunctional/pull/173/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Pedro+Rodriguez#diff-ZnVuY3Rpb25hbC9zdHJlYW1zLnB5) | `96.93% <100.00%> (+0.06%)` | :arrow_up: | | [functional/test/test\_functional.py](https://codecov.io/gh/EntilZha/PyFunctional/pull/173/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Pedro+Rodriguez#diff-ZnVuY3Rpb25hbC90ZXN0L3Rlc3RfZnVuY3Rpb25hbC5weQ==) | `99.02% <100.00%> (+0.07%)` | :arrow_up: | | [functional/test/test\_util.py](https://codecov.io/gh/EntilZha/PyFunctional/pull/173/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Pedro+Rodriguez#diff-ZnVuY3Rpb25hbC90ZXN0L3Rlc3RfdXRpbC5weQ==) | `100.00% <100.00%> (ø)` | | | [functional/util.py](https://codecov.io/gh/EntilZha/PyFunctional/pull/173/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Pedro+Rodriguez#diff-ZnVuY3Rpb25hbC91dGlsLnB5) | `98.46% <100.00%> (+0.12%)` | :arrow_up: | | [functional/\_\_init\_\_.py](https://codecov.io/gh/EntilZha/PyFunctional/pull/173/diff?src=pr&el=tree&utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Pedro+Rodriguez#diff-ZnVuY3Rpb25hbC9fX2luaXRfXy5weQ==) | | | Help us with your feedback. Take ten seconds to tell us [how you rate us](https://about.codecov.io/nps?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Pedro+Rodriguez). Have a feature suggestion? [Share it here.](https://app.codecov.io/gh/feedback/?utm_medium=referral&utm_source=github&utm_content=comment&utm_campaign=pr+comments&utm_term=Pedro+Rodriguez)

:umbrella: View full report at Codecov.
:loudspeaker: Do you have feedback about the report comment? Let us know in this issue.