EntilZha / PyFunctional

Python library for creating data pipelines with chain functional programming
http://pyfunctional.pedro.ai
MIT License
2.41k stars 132 forks source link

More dictionary/tuple list helpers #174

Open foresthu2006 opened 2 years ago

foresthu2006 commented 2 years ago

Out of curiosity, I was wondering why there aren't filter/value equivalents of functions that operate on a list of tuples, eg filter_by_key, group_by_value, or map_values. As one use case, It'd be great to have more support for operations after calling group_by.

I really enjoy using Pyfunctional, and I'm happy to contribute if there are no blockers here!

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.

EntilZha commented 1 year ago

Sorry for out-close, I've been busy.

I'm a little wary of starting to add tons of functions, although not necessarily opposed. The main thing I'd be looking for in a proposal for what to add is how it doesn't become just adding the whole API with _by_key and _by_value and ideally having a more general solution.

There seem to be (at least) two categories here. (1) convenience functions to make things shorter, for at least some of these I think the ability to register user defined functions can cover this, since not everyone will need every method to have _by_key or _by_value. (2) new functionality that isn't easily replicated by calling filter(lambda x: fn(x[]) instead of filter_by_key(fn). The things that came to mind are what might be after groupby.

Sorry for the long response period!

vadyalex commented 1 year ago

I got a feeling I would like to have _map_by_key once or twice recently.. Re-iterating my thoughts I identified the root cause for myself is why would I want to have such "short cut" possibly because of PEP 3113.

Currently (building index of only "unique" elements I did recently):

        (
            seq(
                objects
            )
            .map(
                lambda obj: (
                    (obj['type'], obj['id']),
                    obj
                ),
            )
            .group_by_key()
            .filter(
                lambda _: len(_[1]) == 1  # <--- access tuple's right side via index
            )
            .to_dict()
        )

Could have being:

        (
            seq(
                objects
            )
            .map(
                lambda obj: (
                    (obj['type'], obj['id']),
                    obj
                ),
            )
            .group_by_key()
            .filter_by_value(                              # <--- "convenience" function extending API
                lambda _: len(_) == 1   # <--- now one can access tuple's right side directly in lambda
            )
            .to_dict()
        )

With tuple parameter unpacking still in place:

        (
            seq(
                objects
            )
            .map(
                lambda obj: (
                    (obj['type'], obj['id']),
                    obj
                ),
            )
            .group_by_key()
            .filter(                              # <--- same API
                lambda _, group: len(group) == 1   # <--- unpacking tuple to access tuple's right side
            )
            .to_dict()
        )

Possible improvement I can think of would be to introduce helper function to deal with tuple unpacking workaround discussed here, such as:


         import functools

        def star(f):
            @functools.wraps(f)
            def f_inner(args):
                return f(*args)
            return f_inner

        (
            seq(
                objects
            )
            .map(
                lambda obj: (
                    (obj['type'], obj['id']),
                    obj
                ),
            )
            .group_by_key()
            .filter(                              # <--- same API
                star(
                    lambda _, group: len(group) == 1 # <--- unpacked tuple via `star` function to access tuple's right side
                )
            )
            .to_dict()
        )

What do you think? Is it worth adding star helper function to pyfunctional.utils instead/along side of extending API?

EntilZha commented 1 year ago

Ya, I see what you're saying and looking at the API again, I thin it would be reasonable to have _by_value and by_key methods for a limited set functions. Would having it on map, filter, and group_by fill most use cases you think? I think that would be reasonable to add and would accept a PR implementing that. Thanks!

stale[bot] commented 1 year ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.