elm-community / list-extra

Convenience functions for working with List.
http://package.elm-lang.org/packages/elm-community/list-extra/latest
MIT License
135 stars 59 forks source link

UniqueBy is implemented using recursion #61

Closed michaeljones closed 7 years ago

michaeljones commented 7 years ago

I have experienced a problem with hitting the maximum call stack size when using uniqueBy on a long list (22,000 entries.) I am not very wise in the ways of these things but it occurs to me that it might be due to the recursive implementation.

I have put together a foldl based implementation that seems to run without issue though i haven't verified that is works properly yet. If it does, would you be interested in a pull request?

uniqueBy : (a -> comparable) -> List a -> List a
uniqueBy f list =
    let
        foldf entry ( set, current ) =
            let
                key =
                    f entry
            in
            if Set.member key set then
                ( set, current )
            else
                ( Set.insert key set, entry :: current )
    in
    List.foldl foldf ( Set.empty, [] ) list
        |> Tuple.second
pzp1997 commented 7 years ago

The current implementation is as it is for performance reasons. Could you make a benchmark comparing your version to the current implementation to see how performance will be affected using elm-benchmark? If there is a significant drop-off, I could implement a trick to have the current implementation switch to a tail-call optimized implementation when the list exceeds a certain number of elements (I happen to be currently working on doing this for the At functions). That way there won't be a performance hit for small lists (< 2000 elements or so) but the stack won't overflow for large lists like the one you're dealing with.

michaeljones commented 7 years ago

Ah, that makes sense. Thanks for the explanation. I don't have any experience with elm-benchmark but I'll try to give it ago within the next week if I can. I did read of a similar switching approach in some other elm discussion thread when I was googling around. Seems like it might be necessary.

pzp1997 commented 7 years ago

I did some quick benchmarking and your implementation seems to be performing more or less on par with the current one. One thing I would be careful of is to make sure you're not reversing the list in the process.

michaeljones commented 7 years ago

Thank you for doing that, very interesting to see. And thank you for pointing out the bug. I hadn't realised at all, though fortunately it isn't a big deal for my case as the code is already in production!

Playing with the benching marks I couldn't see a huge difference between the foldl + reverse and foldr but I assume there is an understood best approach for this. Otherwise happy to hear that it is within tolerance. Still happy to see a switch in there if it is deemed preferable.

Thanks again! A very educational experience all in.

andys8 commented 7 years ago

I've the same problem and I think the problem exists in many methods of this library because of their recursive implementation. In my opinion it should be solved in a way which will not possibly crash an entire application at runtime if a list is long enough.

list : List Int
list =
    List.range 0 100000
    |> List.Extra.unique

image

https://ellie-app.com/3MxzbGDZdNSa1/0

@michaeljones Thanks for the non-recursive example :)

andys8 commented 7 years ago

Out of interest I made an implementation based of the original uniqueHelp using an accumulator. Like the first implementation the list has to be reversed at the end.

uniqueByAcc : (a -> comparable) -> List a -> List a
uniqueByAcc f list =
    uniqueHelpAcc [] f Set.empty list

uniqueHelpAcc : List a -> (a -> comparable) -> Set comparable -> List a -> List a
uniqueHelpAcc acc f existing remaining =
    case remaining of
        [] ->
            List.reverse acc

        first :: rest ->
            let
                computedFirst =
                    f first
            in
                if Set.member computedFirst existing then
                    uniqueHelpAcc acc f existing rest
                else
                    uniqueHelpAcc (first :: acc) f (Set.insert computedFirst existing) rest

I benchmarked the two tail recursive versions and it seems to be a little bit faster depending on the number of collisions. The test is also testing "100.000 Ints" which would fail with the original implementation.

Benchmark

https://ellie-app.com/3MyCRxqSyNva1/2

Group: many collisions

image

Group: some collisions

image

Group: no collisions

image

andys8 commented 7 years ago

I could create a pull request. Do you want me to do it?

Chadtech commented 7 years ago

Yeah that would be awesome!

andys8 commented 7 years ago

@Chadtech Here you go #77