Mikolaj / horde-ad

Higher Order Reverse Derivatives Efficiently - Automatic Differentiation library based on the paper "Provably correct, asymptotically efficient, higher-order reverse-mode automatic differentiation"
BSD 3-Clause "New" or "Revised" License
33 stars 6 forks source link

Merge `ranked` and `shaped` (as in `ADReady`) into `tensor` indexed by `[Maybe Nat]` #105

Open Mikolaj opened 1 year ago

Mikolaj commented 1 year ago

This is @awf's idea that would improve the end user API. The Nothing would signify that the shape may differ at runtime. Cognitive load would be reduced (no more ranked and shaped in signatures) and new expressiveness would be gained (tensors that are ranked on some dimensions but shaped on others).

For this to be possible we need an extension of GHC.TypeLits.Normalise plugin family and somebody to maintain it. [Edit: also an extension of orthotope would be needed]. It would be great to see if any other Haskell libraries require such functionality already. Or we could move to Agda.

We'd also need good examples where the ability to vary some shapes during runtime is crucial, because in all current examples the shapes are known and fixed once they are determined at the start of runtime (by looking at input data). It's not even clear such varying shapes would work, e.g., with our current optimizer implementations.

An example. A signature in our current formalism.

type LayerWeightsRNNShaped shaped in_width out_width r =
  ( shaped r '[out_width, in_width]   -- input weight
  , shaped r '[out_width, out_width]  -- state weight
  , shaped r '[out_width] )           -- bias

rnnMnistTwoS
  :: forall out_width batch_size sizeMnistHeight.
     ADReady ranked shaped r
  => shaped r [2 * out_width, batch_size]
  -> PrimalOfS shaped r [sizeMnistHeight, batch_size]
  -> (LayerWeightsRNNShaped shaped sizeMnistHeight out_width r
     ,LayerWeightsRNNShaped shaped out_width out_width r)
  -> (shaped r [out_width, batch_size]
     ,shaped r [2 * out_width, batch_size])

A new signature of a different program. Let's assume the program requires some shapes to vary.

type data Size = Dynamic | N Nat

type LayerWeightsRNNShaped tensor in_width out_width r =
  ( tensor r '[out_width, in_width]   -- input weight
  , tensor r '[Dynamic, out_width]    -- state weight
  , tensor r '[out_width] )           -- bias

rnnMnistTwoS
  :: forall out_width batch_size sizeMnistHeight.
     (ADReady tensor r, N 0 + batch_size ~ batch_size) 
  => tensor r [Dynamic, batch_size]
  -> PrimalOfS tensor r [sizeMnistHeight, batch_size]
  -> (LayerWeightsRNNShaped tensor sizeMnistHeight out_width r
     ,LayerWeightsRNNShaped tensor out_width out_width r)
  -> (tensor r [Dynamic :: Size, batch_size :: Size]
     ,tensor r [N 2 * out_width, batch_size])
awf commented 1 year ago

Great write up!

Mikolaj commented 9 months ago

Related: I'm now using a lot yet another type, the heterogeneous vector of tensors of different scalar types and different ranked/shaped form. The vector is completely untyped currently, but it could be indexed with a list of individual tensor types.

So, using the idea from the ticket, a heterogeneous vector would be indexed by an expression of the form [([Maybe Nat], Type)]. where the Type is the type of the scalar. I'm afraid this is a completely unwieldy type information, but maybe there is a way to present it better or somehow simplify. It's already better (and more expressive) than the original one would be: [(Either [Nat] Nat, Type)].

Mikolaj commented 8 months ago

An example of how the [Maybe Nat] type indexes seem to emerge naturally once we permit the notation of tensors nested within tensors. Consider a type signature

u :: (RankedTensor ranked, ShapedTensor shaped)
  => shaped '[23, 34] (shaped '[22] Float, ranked 1 Double)

or and even simpler case, where the product is trivial (but nesting is not)

t :: (RankedTensor ranked, ShapedTensor shaped)
  => shaped '[23, 34] (ranked 1 Double)

This looks fine, but we'd like to represent this using unboxed tensors exclusively and, in particular, have instances of the shaped and ranked tensor type classes as unboxed concrete arrays (e.g., from package orthotope, where the type of ranked arrays is OR.Array and of shaped arrays is OS.Array). Would the type of the instantiated t be represented as

t2 :: OR.Array 3 Double

but then the instance is not faithful, because we can't do

ssum (t2 :: OR.Array 3 Double)

while we can do

(ssum t) :: shaped '[34] (ranked 1 Double)

Obviously, the type of the instantiated t can't be

t3 :: OS.Array '[23, 34, ???] Double

because we can't fix statically what ??? is going to be. So we are probably going to need

t4 :: OX.Array '[Just 23, Just 34, Nothing] Double

Or maybe, to the contrary, the concrete array instance should project all tensors (or all ranked and product tensors) to ranked arrays and we should ban all shaped operations on product/nested tensors?

As a side note, while I can apply ssum to an argument

t :: (RankedTensor ranked, ShapedTensor shaped)
  => shaped '[23, 34] (ranked 1 Double)

I can't apply to it sgather

  sgather
    :: forall r sh2 p sh.
       ( GoodScalar r, Sh.Shape sh2, Sh.Shape sh, Sh.Shape (Sh.Take p sh)
       , Sh.Shape (Sh.Drop p sh), Sh.Shape (sh2 Sh.++ Sh.Drop p sh) )
    => shaped r sh
    -> (IndexSh shaped sh2 -> IndexSh shaped (Sh.Take p sh))
    -> shaped r (sh2 Sh.++ Sh.Drop p sh)

because sgather (unlike rgather) depends on the full shape being statically declared. So adding the [Maybe Nat] type indexes is just the beginning. We then need to define a new set of tensor operations with the new typing.

In the best scenario, for each current pair of a ranked and a shaped operation we'd get just one mixed operation. In the worst case, we'd get three or more. One would need to try. Another planning phase consideration would be whether orthotope can be extended to the mixed type indexes without problems (e.g., without losing some of its ability to restructure tensors in constant time by manipulating only the metadata). Or can we get away without extending orthotope at all that in some manner?

tomsmeding commented 8 months ago

Just a thought:

type MapJust :: '[k] -> '[Maybe k]
type family MapJust l where
  MapJust '[] = '[]
  MapJust (x ': xs) = Just x ': MapJust xs

type SArr :: '[Nat] -> Type -> Type
data SArr l t  -- no constructors, this is just a tag type

type RArr :: Nat -> Type -> Type
data RArr n t  -- idem

-- AstRanked and AstShaped are indexed by the tag types to indicate what kind of array they are describing
data AstRanked span t where
  AstSum :: AstRanked s (RArr (1 + n) t) -> AstRanked (RArr n t)
  -- etc.

type ShapeVec :: Type -> '[Maybe Nat]
type family ShapeVec t where
  ShapeVec (SArr l t) = MapJust l ++ ShapeVec t
  ShapeVec (RArr n t) = Replicate n Nothing ++ ShapeVec t
  ShapeVec _ = '[]

type Core :: Type -> Type
type family Core t where
  Core (SArr l t) = Core t
  Core (RArr n t) = Core t
  Core t = t

type Rep t = OX.Array (ShapeVec t) (Core t)
tomsmeding commented 8 months ago

The above is a bad idea that destroys all the compositionality that we have; don't use it. Though I think there is something here; need to think more before typing.

Mikolaj commented 8 months ago

Tom's idea that may be easy to implement: replace the ??? above with just Nat type variables (or expressions with them). Because we don't have filter, this should not require packed existentials, because the Nat expressions should be statically known.

Mikolaj's TODO: try to write a filter odd function in horde-ad using fromList and see where it breaks (and keep the failed attempts as tests). It should break at not being able to inspect tensor elements due to this being interpretable in the symbolic layer (maybe ifF would not work, somehow).

Mikolaj commented 7 months ago

Mikolaj's TODO: try to write a filter odd function in horde-ad using fromList and see where it breaks

It breaks, as it should, already when type-checking:

https://github.com/Mikolaj/horde-ad/blob/4b565ae2fbdb54f73befcdd10b038713511a8184/test/simplified/TestAdaptorSimplified.hs#L1888-L1897

However, from this experiment I got the feeling it would be easy to add hardwired operations like filter to our tensor class and I think we are likely to add them in the future unless there's a very compelling reasons not to (on top of them not being typable with shaped tensors but only with ranked tensors).

tomsmeding commented 7 months ago

If you can, that's great! That would mean that we do not actually need the restriction that all tensor shapes are statically known. It would change a lot of typing: for example, rbuild currently takes a ShapeInt which is just a sized list of Ints. If tensor shapes do not need to be statically known, we could loosen this to a sized list of ASTs (i.e. IndexOf)!

EDIT: the Could not deduce ‘BoolOf ranked ~ Bool’ error you got from filter (\x -> x >=. 0) l is precisely staging rearing its head: you're trying (through filter) to branch at "compile time" of the staged program on values known only at runtime of the staged program.

tomsmeding commented 7 months ago

How does build1 k (\i -> filter f a) vectorise, for some function f and some array-typed expression a?

Mikolaj commented 7 months ago

Ouch, you are right. We could add the operations easily, but then our machinery breaks at once. The first thing to break would be

https://github.com/Mikolaj/horde-ad/blob/4b565ae2fbdb54f73befcdd10b038713511a8184/src/HordeAd/Core/AstTools.hs#L47-L48

and everything depends on it. Vectorization would get stuck as well.

tonyday567 commented 2 months ago

The dimensions library implements a similar idea to this:

https://hackage.haskell.org/package/dimensions-2.1.1.0/docs/Numeric-Dimensions-Dim.html#t:XNat

Mikolaj commented 2 months ago

Oh wow, thank you, the whole https://github.com/achirkin/easytensor repo is interesting. I wonder how much it overlaps with the not yet on Hackage library by @tomsmeding.

tomsmeding commented 2 months ago

It seems to serve a different goal; the partially static shapes definitely overlap, but the major reason why ox-arrays exists is because of array nesting. Easytensor seems to have more focus on particular operations and particular dimensionalities (there are explicit operations on matrices) and much less on general multi-dimensional tensor operations (reshape, replicate, partial reductions / indexing, etc.). I'm not sure what the idea of a data frame is besides a collection of arrays of the same size (I may be missing something there). The underlying representation of an array in easytensor seems to be orthotope-ish, except that the innermost stride must be 1. I'm not sure to what extent it allows "weird" stride vectors.