Improve asymptotics of Representable for instances without random access

aaronvargo commented 6 years ago

Representable is currently inefficient for instances which don't have O(1) index, such as Stream and Cofree. E.g. zipping two streams with mzipRep is at best O(n^2). This PR adds new methods which are equal in power to tabulate and index for an existential Rep, but which don't rely on indexing and thus can be asymptotically faster. The new methods are more powerful versions of those in Distributive, with the simplest one being:

distribute1 :: forall w. Functor1 w => w f -> f (w Identity)

where a Functor1 is a functor of kind (* -> *) -> *:

class Functor1 w where
  map1 :: (forall a. f a -> g a) -> w f -> w g

By using Functor1 instead of Functor, distribute1 allows zipping of containers with different element types, unlike distribute which can only zip containers with elements of the same type. E.g. distribute1 is powerful enough to implement mzipRep, by instantiating w with PairOf a b:

data PairOf a b f = PairOf (f a) (f b)
instance Functor1 (PairOf a b) where
  map1 f (PairOf as bs) = PairOf (f as) (f bs)

mzipRep :: Representable f => f a -> f b -> f (a, b)
mzipRep as bs = fmap f $ distribute1 $ PairOf as bs
  where f (PairOf (Identity a) (Identity b)) = (a, b)

In fact, distribute1 (plus fmap) is powerful enough to derive Representable, by selecting Rep f = Logarithm f where:

newtype Logarithm f = Logarithm { runLogarithm :: forall x. f x -> x }

with the idea being that an index can just be defined to be a function which gets the element at that index.

So distribute1 is equal in power to tabulate and index when one doesn't care what the value of Rep is. Yet distribute1 can be asymptotically faster for instances without random access. E.g. the above implementation of mzipRep can be O(n) for streams, given an appropriate implementation of distribute1:

data Stream a = Cons { head :: a, tail :: Stream a }

distribute1 :: Functor1 w => w (Stream a) -> Stream (w Identity)
distribute1 w = map1 (Identity . head) w `Cons` distribute1 (map1 tail w)

I didn't include it in this PR since it would be a potentially breaking change, but it's probably also worth considering changing the defaults of the class so that Rep defaults to Logarithm.

ekmett commented 6 years ago

Merging. I may want to bikeshed some names later, like the module name for Data.Functor1, but the general idea is neat.

sjoerdvisscher commented 6 years ago

This would be a good opportunity to merge Distributive and Representable.

aaronvargo commented 6 years ago

Or, perhaps there could be two classes:

class Functor f => Distributive f where
  distribute1 :: Functor1 w => w f -> f (w Identity)

-- no fundep!
class Distributive f => Representable r f where 
  askRep :: f r
  index :: f a -> r -> a

instance Distributive f => Representable (Logarithm f) f

-- for cases where one wants a canonical rep
type family Rep (f :: * -> *) :: *

Allowing multiple reps per functor could be useful, e.g. for memoizing functions.

ekmett commented 6 years ago

I'm pretty negative on the idea of the multiple rep version.

Naked fundep-less MPTCs like that tend to be a huge type inference trap. The power gained comes at a real practical cost here, as neither askRep nor index is going to force you to 'choose' r when you go to compose them together to build, say, the liftA2 equivalent, meaning it becomes a mess of semantically relevant type anotations, etc.

aaronvargo commented 6 years ago

The idea is that you probably won't be composing askRep and index like that, since these are the sorts of functions which can be made asymptotically faster by using distribute1. Also, you can have a version of tabulate specialized to Logarithm, which doesn't require index:

tabulateAlg :: Distributive f => ((forall x. f x -> x) -> a) -> f a

Regardless, the fundep-less MPTC doesn't preclude use of a Rep type family as well. The real question would be whether derived functions should be specialized to Rep, allowed to work with any r, or if there should be two versions of each. Even if they are all specialized to Rep, the fundep-less class still might be useful, as memoization actually requires a fundep in the opposite direction:

type family Memoize (r :: *) :: * -> *
memoize :: Representable r (Memoize r) => (r -> a) -> r -> a

It might be nice to have a single class which can be shared between the two uses, though ultimately it doesn't really matter.

ekmett commented 6 years ago

I'm frankly not interested in packaging and maintaining the fundep-less version of this class myself. As mentioned above it triggers all of my "bad code smell" warnings about producing and consuming something with significant type checking.

Saying "you don't do that now" is pretty far out there when literally all of the code that had been written against this class to date is precisely doing that, so there is quite some growing pain to be had between the state of the world as it exists and the state of the world you'd like to see.

My main thoughts at the moment are towards ensuring that the TF-less Distributive has a way to continue existing and gains whatever power it can/should have, and that the TF-based Representable doesn't suffer a needless asymptotic hit.

aaronvargo commented 6 years ago

My main thoughts at the moment are towards ensuring that the TF-less Distributive has a way to continue existing and gains whatever power it can/should have, and that the TF-based Representable doesn't suffer a needless asymptotic hit.

I suppose most of the code I've written should be moved to Distributive then?

ekmett commented 6 years ago

That was where I was going with that. =) Up until now my plan had been to add members that emulated monad functionality, because every distributive can be a monad in a canonical way, but not every one of those can have the actual Monad instance (e.g. Compose), if some of the Functor1 stuff can kick in to help out there we'd be in a much better place.

There still remains the issue of bikeshedding the name Functor1 and finding it a better upstream home, as it really isn't about representability/distributivity at all.

RyanGlScott commented 6 years ago

Functor1 currently exists in the rank2classes package under a different name (Rank2.Functor). On the other hand, that library's version bounds are far too tight for adjunctions' purposes, so I don't think it can reasonably depend on it.

ekmett commented 6 years ago

There is also the meta problem that that package exports a "rank-2" Distributive, which would ideally have the same pattern applied to it recursively. Not sure how to proceed there without PolyKinds. Fortunately PolyKinds are old enough to be viable, but that invites the question about how if we're going there, why are we worrying about a TF?

aaronvargo commented 6 years ago

If we do want to go there (at some point), I think I know how to generalize distribute1 to at least any functor F : C -> Hask where C is either Hask or a functor category, but it requires some basic category machinery, and preferably the data families + unsafeCoerce hack as well.

For simplicity, imagine we have kind indexed categories as well as type lambdas. The polykinded Distributive ought to look something like this, for some definition of Foo:

class Functor f => Distributive (f :: k -> *) where
  distribute1 :: forall (w :: (k -> *) -> *). Functor w => w f -> f (Foo w)

Foo has kind ((k -> *) -> *) -> k, i.e. Foo :: Cont * k -> k. The following definitions of Foo for k = * and k = i -> j give the desired results:

Foo :: ((k -> *) -> *) -> k
Foo @* = ($ Identity)
Foo @(i -> j) cont x = Foo @j (contMap ($ x) cont)

contMap :: (a -> b) -> ((a -> *) -> *) -> (b -> *) -> *
contMap f g h = g (h . f)

ekmett commented 6 years ago

@aaronvargo: Yeah. I had much the same thought. I'm still resisting being the one to package up the super-general version of all of that, largely because it begins to dip into pseudo-category theory based around parametricity instead of naturality. That and it has the disadvantage of drastically increasing the footprint of distributive whose sole reason for existence is simplicity.

aaronvargo commented 6 years ago

largely because it begins to dip into pseudo-category theory based around parametricity instead of naturality

I'm sure that can be fixed. Let's just punt on generalizing it though, since (1) there are still things to work out and (2) then we'll probably just want more things, e.g. we might want to be able to express that the functor \a -> a × a is distributive for any cartesian category (I think what I wrote earlier can be generalized to functors F : C -> D where C is either D or a "functor category ending in D". There may be other generalizations as well). In the meantime, maybe I'll play with some of this and perhaps add it to naperian (though I doubt anyone will use it).

As for where to put Functor1 (after coming up with a better name), could we just leave it in distributive and tell people to use Rank2.Functor instead if they need it for anything other than calling distribute1 and friends? Chances are that most code using Distributive won't need to use distribute1 directly anyway (other than implementing it for instances).

As for possible names, here are a few:

FFunctor
Rank2Functor
R2Functor

sjoerdvisscher commented 6 years ago

What if we apply Yoneda to get rid of the Functor1 class completely?

distribute1 :: (forall g. (forall a. f a -> g a) -> w g) -> f (w Identity)

aaronvargo commented 6 years ago

Unfortunately that would ruin the asymptotics, since Yoneda doesn't allow sharing. Note the similarity to:

tabulateAlg :: ((forall x. f x -> x) -> a) -> f a

We could have:

distribute1 :: (forall g h. (forall a. g a -> h a) -> w g -> w h) -> w f -> f (w Identity)

But I think having the Functor1 class is clearer.

phadej commented 6 years ago

My ramblngs about naming. I think it would nice to package FFunctor, FContravariant

{-# LANGUAGE ConstraintKinds #-}
{-# LANGUAGE KindSignatures #-}
{-# LANGUAGE NoStarIsType #-}
{-# LANGUAGE PolyKinds #-}
{-# LANGUAGE QuantifiedConstraints #-}
{-# LANGUAGE RankNTypes #-}
{-# LANGUAGE TypeOperators #-}

import Data.Kind
import qualified Control.Category as C

-- I realised that I need such class:

{-
-- recall:
class Eq1 f where
    eq1 :: Eq a => f a -> f a -> Bool
-}

class Functor1 (t :: (Type -> Type) -> Type -> Type) where
    fmap1 :: Functor f => (a -> b) -> t f a -> t f b

-- Therefore I don't want for adjunctions to steal 'Functor1' name.
-- The "rule" here, is that with QuanifieidConstraints we could write:
--
-- type Functor1 t = forall f. Functor f => Functor (t f)
-- type Functor2 t = forall f g. (Functor f, Functor g) => Functor (t f g)
--
-- The  same way as we could
--
-- type Eq1 f = forall f. Eq a => Eq (f a)
-- type Eq2 f = forall f. (Eq a, Eq b) => Eq (f a b)

-------------------------------------------------------------------------------
-- So what about adjunctions?
-------------------------------------------------------------------------------

-- I'd prefer some _prefix_, I use here F, without any particular reason.

type f ~> g = forall x. f x -> g x
infixr 0 ~>

newtype NT (f :: k -> Type) (g :: k -> Type) = NT { runNT :: f ~> g }

instance C.Category NT where
    id = NT id
    NT f . NT g = NT (f . g)

class FFunctor (t :: (k -> Type) -> Type) where
    kmap :: (a ~> b) -> t a  -> t b

class FContravariant (t :: (k -> Type) -> Type) where
    kcontramap :: (a ~> b) -> t b -> t a

newtype Logarithm f = Logarithm { runLogarithm :: forall x. f x -> x }

instance FContravariant Logarithm where
    kcontramap f (Logarithm l) = Logarithm (l . f)

class FProfunctor (p :: (k -> Type) -> (k -> Type) -> Type) where
    kdmap :: (a ~> b) -> (c ~> d) -> p b c -> p a d

instance FProfunctor NT where
    kdmap f g (NT p) = NT (g . p . f)

-- etc.

Those might even fit into to natural-transformation, @RyanGlScott ?

phadej commented 6 years ago

Functor1 is a bad name, it doesn't "generalise" to Traversable1 (which is something completely different)

ekmett / adjunctions

Improve asymptotics of Representable for instances without random access #49