Closed michaeljones closed 7 years ago
The current implementation is as it is for performance reasons. Could you make a benchmark comparing your version to the current implementation to see how performance will be affected using elm-benchmark? If there is a significant drop-off, I could implement a trick to have the current implementation switch to a tail-call optimized implementation when the list exceeds a certain number of elements (I happen to be currently working on doing this for the At functions). That way there won't be a performance hit for small lists (< 2000 elements or so) but the stack won't overflow for large lists like the one you're dealing with.
Ah, that makes sense. Thanks for the explanation. I don't have any experience with elm-benchmark but I'll try to give it ago within the next week if I can. I did read of a similar switching approach in some other elm discussion thread when I was googling around. Seems like it might be necessary.
I did some quick benchmarking and your implementation seems to be performing more or less on par with the current one. One thing I would be careful of is to make sure you're not reversing the list in the process.
Thank you for doing that, very interesting to see. And thank you for pointing out the bug. I hadn't realised at all, though fortunately it isn't a big deal for my case as the code is already in production!
Playing with the benching marks I couldn't see a huge difference between the foldl + reverse
and foldr
but I assume there is an understood best approach for this. Otherwise happy to hear that it is within tolerance. Still happy to see a switch in there if it is deemed preferable.
Thanks again! A very educational experience all in.
I've the same problem and I think the problem exists in many methods of this library because of their recursive implementation. In my opinion it should be solved in a way which will not possibly crash an entire application at runtime if a list is long enough.
list : List Int
list =
List.range 0 100000
|> List.Extra.unique
https://ellie-app.com/3MxzbGDZdNSa1/0
@michaeljones Thanks for the non-recursive example :)
Out of interest I made an implementation based of the original uniqueHelp
using an accumulator. Like the first implementation the list has to be reversed at the end.
uniqueByAcc : (a -> comparable) -> List a -> List a
uniqueByAcc f list =
uniqueHelpAcc [] f Set.empty list
uniqueHelpAcc : List a -> (a -> comparable) -> Set comparable -> List a -> List a
uniqueHelpAcc acc f existing remaining =
case remaining of
[] ->
List.reverse acc
first :: rest ->
let
computedFirst =
f first
in
if Set.member computedFirst existing then
uniqueHelpAcc acc f existing rest
else
uniqueHelpAcc (first :: acc) f (Set.insert computedFirst existing) rest
I benchmarked the two tail recursive versions and it seems to be a little bit faster depending on the number of collisions. The test is also testing "100.000 Ints" which would fail with the original implementation.
https://ellie-app.com/3MyCRxqSyNva1/2
I could create a pull request. Do you want me to do it?
Yeah that would be awesome!
@Chadtech Here you go #77
I have experienced a problem with hitting the maximum call stack size when using
uniqueBy
on a long list (22,000 entries.) I am not very wise in the ways of these things but it occurs to me that it might be due to the recursive implementation.I have put together a
foldl
based implementation that seems to run without issue though i haven't verified that is works properly yet. If it does, would you be interested in a pull request?