Make transpose call stack size safe

rogeriochaves commented 7 years ago

With 10000 elements the transpose function start giving Maximum call stack size exceeded

Chadtech commented 6 years ago

Hey @rogeriochaves

This looks pretty good, but I am bench marking it as being ~x8 slower: https://ellie-app.com/kqXGQNgMBa1/1

Im not sure if its worth the trade off, but also, to be honest I am not sure how to weigh the performance hit with the stack safety.

zwilias commented 6 years ago

Here's an alternative, based on @rogeriochaves' implementation, that seems to outperform the current List.Extra implementation and is stack-safe.

https://ellie-app.com/68qqkSvkta1/3

The benchmarks test lists of size 10, 100, 1000 and 2000 where each sublist has between 0 and 200 integers. Flipping the comments on the two mains will run a very basic fuzz test.

At @eeue56's suggestion; an attempt to explain why there are 3 functions involved, how they make this fast and how they make this stack-safe.

From a birds eye perspective; the implementation works somewhat like this:

take the first element of every sublist, consing the resulting list to an accumulator
keep doing this until we run out of elements
when every sublist is empty, reverse the accumulator

Why an accumulator?

By using an accumulator rather than building up the result directly, as was done in the original code, we can put the recursive call in the tail position; meaning that the final expression to be evaluated is a call to the same function. When the Elm compiler sees a case where recursion happens in the tail call, it can compile that into a while loop. The resulting code doesn't actually recurse, which means the call-stack depth doesn't increase and the implementation is stack-safe.

Why not appending so there's no need to `reverse`?

Appending to the end of a List has 2 drawbacks:

it needs to traverse the entire list
it can't do any structural sharing

By consing our accumulated results to the front of the accumulator, we only need 1 traversal: reversing the result at the end. Every iteration of the accumulator is a structurally shared list made up of a new head and the existing accumulator as the tail.

This saves both time (less traversals) and memory (structural sharing); which in turn saves more time (less garbage collection interference).

Why that `headsTails` function?

The original code used List.filterMap head list and List.filterMap tail list. This is perfectly fine and perhaps even preferred from a code-reuse perspective, it also means doing 2 traversals where one would suffice. Furthermore, both head and tail return Nothing under the same conditions - we can use a single pattern x::xs to know it has both a head and a tail. In other words, this saves traversals (one rather than two), pattern-matching everything twice, and filtering the result of that call.

gilbertkennen commented 6 years ago

Has there been discussion about the proper way to handle non-rectalinear matrices? I dislike the solution of collapsing missing values so that the new list position doesn't reflect the position of the list an element came from.

[ [ 1, 2, 3 ]
, [4, 5]
, [6, 7, 8]
]

-- transposes to
[ [ 1, 4, 6 ]
, [ 2, 5, 7 ]
, [ 3, 8 ]
]

-- transposes back to
[ [ 1, 2, 3 ]
, [ 4, 5, 8 ] -- the 8 has moved up a row
, [ 6, 7 ]
]

We could wrap the whole thing in a Maybe which fails on different-length lists.
We could wrap individual elements in Maybe, which would be a bit awkward to deal with.
We could truncate the matrix to the shortest list size, which seems a bit surprising, but less surprising than the current functionality and it gives us the nicer type that wouldn't need to change.

pzp1997 commented 6 years ago

I believe I have a tail recursive solution that has between 3x (for 10 elements) to 8x (for all other sizes) better performance than @zwilias. In regards to @gilbertkennen's comment, my implementation goes with the third approach, truncating to the shortest row length, but could be easily adapted to wrap the entire result in a Maybe as per approach 1.

Here's the code:

transposePalmer : List (List a) -> List (List a)
transposePalmer listOfLists =
    List.foldr (List.map2 (::)) (List.repeat (rowsLength listOfLists) []) listOfLists

rowsLength : List (List a) -> Int
rowsLength listOfLists = case listOfLists of
                            [] ->
                                0

                            x :: _ ->
                                List.length x

and the benchmark: https://ellie-app.com/pw7k2sR8ma1/1. Whipped this up rather quickly, so let me know if I'm missing something.

EDIT

I didn't really look at the benchmarking tests before. @gilbertkennen is correct that when comparing the performance only on rectilinear lists, the performance is about 1.5x better.

Also, I said above that the solution is tail-recursive. That's not the proper terminology though because List.foldr and List.map2 are not even implemented recursively in Elm (they're both Native/Kernel functions). What I really meant is that the implementation is stack safe.

I thought about the possible methods of handling jagged lists and I think that wrapping the entire result in a Maybe makes the most sense.

gilbertkennen commented 6 years ago

While this is an improvement, the test is a bit unfair comparing truncating vs. not when using very uneven row lengths. Benchmarks on equal-length rows gives us ~1.5x which is still very good.

https://ellie-app.com/6Rs9Tnr7Ka1/1

jvoigtlaender commented 6 years ago

Concerning List.foldr being a Native/Kernel function in Elm: That is the case for the currently released version, but will not be much longer (assuming Elm 0.19 is released). See https://github.com/elm-lang/core/blob/master/src/List.elm.

So that part of the benchmarking here is really against a moving target. Who knows whether with List.foldr as implemented in the next release of the core library, the 1.5x improvement will stand, or whether for the specific use of List.foldr above, the non-native implementation makes things worse.

zwilias commented 6 years ago

For what it's worth; the implementation currently in the dev tree of elm-lang/core is pretty fast, with a bit of a drop after 2k elements (due to falling back to foldl + reverse). Once there is an 0.19 alpha, though, I'm hopeful that we can start checking how the codegen and resulting size turns out for the foldr and map implementations here: https://github.com/zwilias/elm-faster-map

Your point still stands, of course, but figured I'd add to it.

gilbertkennen commented 6 years ago

Even with a naive implementation of foldr, it's still ~1,4x faster than Ilias's and ~4x faster than list-extra (both using core code for any folds they use).

https://ellie-app.com/6Rs9Tnr7Ka1/1

jvoigtlaender commented 6 years ago

Why "Even"? I assume that naive implementation is not stack safe? A stack safe version might be slower. So it's not like a non-naive version is definitely faster than the naive version, so that if "even" the naive version of foldr makes another function fast we could simply assume that any non-naive version of foldr would make the other function still faster.

(I don't remember what is the case for the foldr expected to be shipped with Elm 0.19: whether it is "just" stack safe, but possibly slower than a naive, non stack safe version, or is both stack safe and faster than the naive implementation.)

gilbertkennen commented 6 years ago

This naive foldr should be stack safe. It reverses the list and runs a TCO foldl.

jvoigtlaender commented 6 years ago

Ah, that's not what I would call "naive". :smile:

gilbertkennen commented 6 years ago

I have been informed that I clinked to the wrong code previously that didn't include the foldr implementation.

https://ellie-app.com/bQBtPcMvwa1/0

Perhaps that will make things clearer.

rogeriochaves commented 6 years ago

@pzp1997 your method fails on this test:

test "short rows are skipped" <|
        \() ->
            Expect.equal
                (transposePalmer [ [ 10, 11 ], [ 20 ], [], [ 30, 31, 32 ] ])
                [ [ 10, 20, 30 ], [ 11, 31 ], [ 32 ] ]

Instead it returns [], in other words, truncating by the smaller one (see https://ellie-app.com/ntkpKfbFCa1/0)

This would be @gilbertkennen 's third suggestion right, and I'm fine with that (personally, I never needed to transpose an uneven matrix)

Is this the behavior we want from now on?

pzp1997 commented 6 years ago

Last I recall, there was no consensus on how the function should behave on non-rectangular inputs. I opened #86 so that we can continue discussing that matter.

elm-community / list-extra