Kind mismatch in derived instances

expipiplus1 commented 1 year ago

data Bar a b where
  Bar :: b -> Bar True b

data Foo b c where
  Foo :: Bar x c -> Foo b c

deriveBifunctor ''Foo

    • Couldn't match kind ‘Bool’ with ‘*’
      When matching types
        p0 :: * -> * -> *
        Bar :: Bool -> * -> *
      Expected: Bar x d
        Actual: p0 b0 d
    • In the first argument of ‘Foo’, namely
        ‘(((bimap (\ _n_a7u1C -> _n_a7u1C)) g_a7u1v) _arg1_a7u1B)’
      In the expression:
        Foo (((bimap (\ _n_a7u1C -> _n_a7u1C)) g_a7u1v) _arg1_a7u1B)
      In a case alternative:
          Foo _arg1_a7u1B
            -> Foo (((bimap (\ _n_a7u1C -> _n_a7u1C)) g_a7u1v) _arg1_a7u1B)
    |
419 | deriveBifunctor ''Foo
    | ^^^^^^^^^^^^^^^^^^^^^

The derived instance shouldn't be using bimap here but rather just fmap

RyanGlScott commented 1 year ago

Hm, this is a tricky one. Before reviewing #125 too closely (thank you for submitting a patch, by the way!), I think it would be worth thinking about what the specification for deriveBifunctor and friends should be, as #125 changes the behavior of these functions in a subtle-but-significant way.

While I haven't written up exactly how I intend deriveBifunctor et al. to behave in full detail, this part of the Haddocks does a pretty good job of explaining what I am going for:

With a derive function, the last two type variables must both be of kind *. Other type variables of kind * -> * are assumed to require a Functor, Foldable, or Traversable constraint (depending on which derive function is used), and other type variables of kind * -> * -> * are assumed to require an Bifunctor, Bifoldable, or Bitraversable constraint.

This criterion gives you a relatively simple way to tell what sort of code deriveBifunctor will derive. If you see a field that looks like T a, then it will be fmapped, and if you see a field that looks like T a b, then it will be bimapped.

One drawback of this criterion is that despite being simple, it is not clever enough to come up with Bifunctor instances for all possible data types. Here is a counterexample:

data Baz a b where
  Baz :: a -> Baz a True

data Quux b c where
  Quux :: Baz c x -> Quux b c

deriveBifunctor ''Quux

When deriveBifunctor ''Quux sees the Baz c x field, it will call bimap on it. This has no hope of working, however, as Baz :: Type -> Bool -> Type has the wrong kind for Bifunctor. On the other hand, it is still possible to define a Bifunctor instance for Quux if you use a much different approach:

mapBaz :: (a -> a') -> Baz a b -> Baz a' b
mapBaz f (Baz x) = Baz (f x)

instance Bifunctor Quux where
  bimap _ g (Quux x) = Quux (mapBaz g x)

This works, but it requires much more cleverness that the criterion described above, as it requires knowledge about the definition of Baz to implement. In the spirit of keeping deriveBifunctor dumb but predictable, I have opted not to do anything that requires cleverness of this sort.

The question now is: does the change implemented in #125 rise to this level of cleverness? The example in https://github.com/ekmett/bifunctors/issues/124#issue-1718366377 is rather similar to the example above, as Bar/Foo are slight variations of Baz/Quux. On the other hand, implementing a Bifunctor instance for Foo doesn't require knowing anything about the definition of Bar to implement—it just requires being careful about counting type variable occurrences. (For instance, if you wrote a Bar b c field instead of a Bar x c field, then it would be bimapped instead of fmapped!)

This one feels right on the borderline of cleverness for me. I need to give this one some thought.

RyanGlScott commented 1 year ago

I realized right after submitting my last comment that my argument isn't entirely consistent with how deriveBifunctor currently works. I was worried that it would be too confusing for users to figure out how their fields would be mapped depending on how many type variable occurrences there are, but in fact, that is already happening to some degree. Consider this example:

data T a b = MkT a b (M Int) (E Bool Char)

deriveBifunctor ''T

Based on the criterion I mention in https://github.com/ekmett/bifunctors/issues/124#issuecomment-1556157583, you might expect the M Int field to be fmapped and the E Bool Char field to be bimapped. But this is not what happens! Instead, this Bifunctor instance would be generated:

instance Bifunctor T where
  bimap f g (MkT x1 x2 x3 x4) = MkT (f x1) (g x2) x3 x4

Because neither field mentions any of the type parameters, they do not get mapped at all. This is a very useful optimization, as we can avoid unnecessary allocations by avoiding redundant fmaps and bimaps. But more importantly, this means that this instance will typecheck even if M does not have a Functor instance or E does not have a Bifunctor instance.

This is all to say: the algorithm that deriveBifunctor uses is already quite sensitive to how type variables occur in each field, and users must be aware of this in order to know exactly which fields will be mapped in which ways. For this reason, the change in #125 is consistent with existing precedent. We should still make sure to document all of this somewhere, as it's rather subtle.

ekmett / bifunctors

Kind mismatch in derived instances #124