elm-community / list-extra

Convenience functions for working with List.
http://package.elm-lang.org/packages/elm-community/list-extra/latest
MIT License
135 stars 59 forks source link

rewrite groupWhile to be tail-recursive #119

Closed BrianHicks closed 5 years ago

BrianHicks commented 5 years ago

I have a very long list for which groupWhile blows the stack! Oh no! It looks like the accumulator function itself was tail-recursive, but not one of the functions it called. This unifies them and adds a bunch of comments to explain what's going on for the next person who needs to look at this code.

BrianHicks commented 5 years ago

bahh, sorry for the failures here. I've reimplemented this in a private repo to avoid the problem and this is more an extraction than a really planned contribution. I'll fix the tests and update here when it's ready to look at.

BrianHicks commented 5 years ago

alright! Looks like this passed, ready for review!

pzp1997 commented 5 years ago

So it looks like the day when someone has a large enough list in practice to break the group functions has finally come.

On the other hand, most non-TCO functions will only break for lists of more than 5k-10k elements, which is quite a lot. I can't remember ever needing a list that large in Elm (although I'm sure other folks have their use-cases).

- me, https://github.com/elm-community/list-extra/issues/75#issuecomment-316088047.

Perhaps we should reconsider the discussion about whether to prioritize stack safety or performance in list-extra. In light of the new evidence presented by Brian, I'm leaning towards the former.

BrianHicks commented 5 years ago

I think that may be a false dichotomy in this case. This code makes exactly two passes over the list as a whole, and two passes over every group… so I guess it's O(4n). I don't see anything that's an obvious performance improvement here, regardless of stack safety. If you see something, I'm happy to add it!

BrianHicks commented 5 years ago

mmm, sorry, that should have been "one pass over the list, one pass to reverse every group, and one pass to reverse the entire list" so I guess it's actually O(3n)

pzp1997 commented 5 years ago

I don't see anything that's an obvious performance improvement here, regardless of stack safety.

Yeah, my original comment was made when the group functions looked a bit different. A lot of things changed with 5d30fa7 making the current implementation neither performant as it could be nor tail recursive.


I think the following stack safe implementation using List.foldr is simpler to read and might even be faster, particularly on lists with fewer than 500 elements due to foldr's "switching behavior" between non-TCO and TCO. I don't have time to run benchmarks unfortunately.

groupWhile : (a -> a -> Bool) -> List a -> List ( a, List a )
groupWhile isSameGroup items =
    List.foldr
        (\x acc ->
            case acc of
                [] ->
                    [ ( x, [] ) ]

                ( y, restOfGroup ) :: groups ->
                    if isSameGroup x y then
                        ( x, y :: restOfGroup ) :: groups

                    else
                        ( x, [] ) :: acc
        )
        []
        items

Here's an Ellie with this implementation and the relevant tests from list-extra: https://ellie-app.com/4N7MXvP5Bpka1.

BrianHicks commented 5 years ago

yeah, I think that might also be faster because it doesn't have to reverse the individual groups. If I understand correctly then:

O(2n).

pzp1997 commented 5 years ago

LGTM!

Chadtech commented 5 years ago

Awesome! This looks good to me too.

Thanks @BrianHicks for the PR, and the code comments. I kind of wish we could get some of those comments into this PR with this new implementation. People root around in the source of List.Extra; I think it would get some visibility.

I would like to benchmark the new implementation, just to characterize it and make sure there are no surprises. Unfortunately, Ellie is not working right now for me, how about I bench mark this function sometime soon, like in the next day or so? And then, we merge this PR?

BrianHicks commented 5 years ago

Sounds great! No rush on my part; we’ve fixed things locally.

Chadtech commented 5 years ago

Okay! I bench marked it. You can see it here: https://github.com/Chadtech/elm-groupwhile-benchmark-0 (couldnt do Ellie for some reason, so I just made a repo).

Results are that this new implementation is about 15% slower. Not too bad I think.