Include GHC.Nat-indexed lists by Tom Smeding

Mikolaj commented 1 year ago

@tomsmeding: while traceShow-debugging, I've stumbled on something shocking:

The shape of an array, i.e., a list of the sizes of its dimensions. In the linearization of the array the outermost (i.e. first list element) varies most slowly. O(1) time. https://hackage.haskell.org/package/orthotope-0.1.2.0/docs/Data-Array-ShapedS.html#v:shapeL

So the order of indexes I'm using, aping orthotope, seems to be inverse to what we have on the sized list. Consequently my strides-based index computation gives different results than fromLinearIdx applied to corresponding shapes. Probably other things would differ, too, but I only tweaked and tested this single bit.

The shocking part is that only one test failed, only sometimes, and never with Doubles, only with Floats. Apparently it all somehow symmetrically cancelled out. Edit: and there was probably only a small difference from numerical errors behaving differently, particularly with crude floating point numbers.

What do we do?

tomsmeding commented 1 year ago

only one test failed

That's interesting, seems to be some kind of symmetry in your tests

... only sometimes, and never with Doubles, only with Floats

... and seemingly said symmetry is only there mathematically, not numerically. Perhaps? This is kind of wild though.

In any case, that haddocks description seems to match with how we represent things in a Shape n i (with the :. order, not the S order): 1. they are sizes, not strides, and 2. the left-most varies most slowly (and hence the right-most the most quickly). It seems the author of orthotope and I were in agreement all along. If I understand correctly, you were assuming something different in the rest of the code; is that right?

Mikolaj commented 1 year ago

You write "the index 'Z :. i :. j' represents 'a[i][j]' in -- traditional C notation." And in C notation i is the outermost dimension."


int test[2][3][4] = {
    {{3, 4, 2, 3}, {0, -3, 9, 11}, {23, 12, 23, 2}},
    {{13, 4, 56, 3}, {5, 9, 3, 5}, {3, 1, 4, 9}}};```

Mikolaj commented 1 year ago

I think we have the same order as orthotope and C visually, but the reverse order vs orthope list-nesting-wise.

Mikolaj commented 1 year ago

But perhaps ours is better or more standard? Edit: meaning, ours in the sized list, not in the old code.

tomsmeding commented 1 year ago

I think we have the same order as orthotope and C visually, but the reverse order vs orthope list-nesting-wise.

Precisely that! I'm not sure which list-nesting is better. Do you usually batch your operations by vectorising them, i.e. letting them work on large tensors instead of scalars? (If so, use the orthotope slowest-at-the-outside nesting.) Or do you usually batch your operations by mapping the operation over a large tensor of jobs? (If so, use my fastest-at-the-outside nesting, i.e. current Shape n i.)

Accelerate uses the fastest-at-the-outside nesting because the whole language is in this "intermediate" (in The Doc parlance) form where you can't have vectors of vectors, so all operations are rank-polymorphic instead; you get types like these:

fold :: (Shape sh, Elt a) => (Exp a -> Exp a -> Exp a) -> Exp a -> Acc (Array (sh :. Int) a) -> Acc (Array sh a)

this is a fold over the innermost (fastest-moving) dimension. The "rank-polymorphic" bit is the polymorphism in sh. (Note that Accelerate's Shape is defined as data sh :. i = sh :. i ; data Z = Z, where usually i ~ Int.) But this is an Accelerate characteristic, no need to copy this if your operations work the other way round.

Mikolaj commented 1 year ago

That's fascinating. For better or worse, I do the same as orthotope, that is, manipulate the outermost dimension and almost never fold or map over scalars. What Andrew proposes is also not focusing on scalars, but building or mapping over arbitrary tensors, which are nested in arbitrary larger tensors. And in case of Sum, keeping the current semantics that sums over the outermost dimension only and in Sum0, getting down from top to the very scalar. No variant that would sum the innermost dimension, leaving others alone. I have no perspective to tell to what extent this started via the influence of orthotope or more generally the highly parallel hardware both Lennart and Andrew were/are working on and to what extent this has anything to do with POPL AD work. But it seems to work well with the vectorization route we're following.

In the old code there is exactly one place where I reverse the index list: that's where I vectorize not by pushing things from the outermost dimension down, but by analysing special cases, such as build (\ix -> index v (ixs :. i)) where if i == ix then the result is slice (index v ixs), etc.

tomsmeding commented 1 year ago

Then it seems that for your purposes, you should reverse Index (and Shape) :) I guess that also relieves you of the necessity of having the (:.) pattern synonym, because now the list is in natural cons-order.

EDIT: hot take, perhaps we need nat-indexed finger trees (spoiler, we don't)

Mikolaj commented 1 year ago

Actually, I've now confused myself completely. The orthotope shapes that work best for me have the outermost (slowest) dimension least nested. But the index lists I'm using in the old code and in index and in build seem to have exactly the reverse nesting. Oh dear. I hope I panicked and it's actually consistent.

tomsmeding commented 1 year ago

You see? You should have used Index n from the beginning!

Mikolaj commented 1 year ago

I knew I shouldn't risk typing my code.

Mikolaj commented 1 year ago

I've just checked and orthotope is consistent: in the few places that they use index lists, they do this in the same nesting as shapes. The uses are in build that takes a function from index lists to scalars and update, that updates a scalar at a given index list. I'm also consistent in my oldest code (not simplified) where I just follow orthotope. I may also be consistent in parts of simplified (build operation and a few others), but not in vectorization (and so in index, indexN), where I reverse the order, because it's so much more convenient in the scenario where you push indexing down the tensor expression and always have tower of indexes above you and a multi-dimensional tensor below you. With the inconsistent order, I have the closest dimesions of the tensor below in in the head of shape, and the closest index above me in the head of the index list.

So it's only me. And I have to decide on just one scheme. For shapes definitely orthotope scheme works great both for vectorized operations and for the vectorization process. I guess the order of indexes has to change in my newest code to match that

So, I agree, let's reverse our sized list and get rid of the pattern. I wonder if that's going to change any of the implementations. In any case, we will also need a snoc, last and init operations for indexs, just as the old cons, head and tail for shapes. What was the about finger trees? ;(

tomsmeding commented 1 year ago

Finger trees are what Data.Sequence is using, i.e. something that you can cons and snoc on in amortised O(1) time :)

It's also far too complicated to even think of making a Nat-indexed version of it --- not because it's actually super complicated (this functional pearl gives a very nice explanation), but because making Nat-indexed versions of things gets very hairy very quickly.

So it's only me. And I have to decide on just one scheme. For shapes definitely orthotope scheme works great both for vectorized operations and for the vectorization process. I guess the order of indexes has to change in my newest code to match that

That, or have two Index types, one for each direction. Depending on how much the reversed order helps your code there, it might be worth the additional administration of having two types.

Mikolaj commented 1 year ago

Yeah, I'm pretty sure I want to switch to orthotope nesting in all places. So just the reverse of our sized lists right now. I'm just rewriting fromLinearIdx randomly until the Float test succeeds (no luck so far).

tomsmeding commented 1 year ago

Untested, so not sure if I'm better than your "no luck so far", but this is what I would write: https://paste.tomsmeding.com/f3UVMR5f

Mikolaj commented 1 year ago

Untested, so not sure if I'm better than your "no luck so far", but this is what I would write: https://paste.tomsmeding.com/f3UVMR5f

Yay, you save the day again. The slightly fixed version of you code below makes the test pass (and I expect all other pre-sized-list tests to still pass, but now without any of the strides hacks in a dozen different places, but with fromLinearIdx2 instead):

fromLinearIdx2 :: Integral i => [i] -> i -> [i]
fromLinearIdx2 = \sh lin -> snd (go sh lin)
  where
    go [] n = (n, [])
    go (n : sh) lin =
      let (tensLin, idxInTens) = go sh lin
          (tensLin', i) = tensLin `quotRem` n
      in (tensLin', i : idxInTens)

This is strange, because just giving reversed list to your original fromLinearIdx and reversing the results (and any random mutation of these) did not succeed. This is mind-bending.

Hah, I first looked at your code and my brain goes "wrong, noooo, the element should be on the right hand side of cons".

That's interesting, seems to be some kind of symmetry in your tests

Right. I guess, reversing the shape gives a similar tensor, just with transposed dimensions. E.g., the sum of its elements will be the same. Reversed indexing into a scalar probably then gives the same scalar in the transposed tensor. I guess there is a difference when your indexing goes halfway. You may end up with a tensor of different shape. Whether that matters in the end, depends on what you do with it (in AD we end up in a scalar always, so perhaps it's all the same in many cases, I don't know if in all, but my intuition is that some naturally occurring operations do differentiate between "many small tensors" and "few big tensors").

tomsmeding commented 1 year ago

Good fix! Of course one should return n in the go [] n case. I was halfway writing a response indicating why your fix was wrong when I finally figured out that you are correct. Goes to show, this kind of code is hard.

I'm still interested by the test symmetries. Surely, if those symmetries end up working so well, then you must be very consistently (in some perverted way) doing the orderings wrong, or sometimes right! Anyway, I hope we don't even need to really understand the reason behind this. :)

Mikolaj commented 1 year ago

@tomsmeding: unless you find any hidden traps, the PR is done. The 3 tests that did "arithmetic underflow" now work fine (which may or may not be the result of fixing a one-liner one-off dimension bug revealed by sized indexes [edit: actually, I oversimplified a little]). Next steps will be generalizing build, gather and scatter to function from sized lists of variables to Index (Int to Index currently) and simplifying some Tensor class method signatures now that we can express them sanely. Then we can complete vectorization (in particular the case of vectorizing gather). But all that's outside the scope of this PR.

I'm still not quite sure I was consistently using the reversed order of indexes vs shapes previously. But that's not impossible and that would explain a part of the mystery of the same tests passing with both. In any case, now I'm pretty sure we are consistently using the other order of indexes throughout the code. And we eliminated 90% of unsafeCoerce in vectorization functions (and the 10% will go soon).

Edit: I've since added 4 cosmetic commits (Shape functions cleanup, hlint mollification, rename to HordeAd.Core.SizedIndex and IsList instances) and a few further commits that tweak the Tensor class operations in the direction Andrew outlined (which promptly broke tests). Edit2: tests fixed

Mikolaj / horde-ad

Include GHC.Nat-indexed lists by Tom Smeding #81