google-research / dex-lang

Research language for array processing in the Haskell/ML family
BSD 3-Clause "New" or "Revised" License
1.59k stars 107 forks source link

Support inferred superclass instances #481

Open dan-zheng opened 3 years ago

dan-zheng commented 3 years ago

Motivation

Haskell has fine-grained typeclass hierarchies. Consider:

class Semigroup a where
  (<>) :: a -> a -> a

class Semigroup m => Monoid m where
  mempty :: m

  -- Defining mappend is unnecessary, it copies from Semigroup.
  mappend :: m -> m -> m
  mappend = (<>)

I want to implement Monoid for [a]. To do so, I must provide two instances: one for Monoid, and one for all the superclasses (Semigroup).

instance Semigroup [a] where
  (<>) = (++)

instance Monoid [a] where
  mempty = []

When implementing a typeclass with a long chain of superclasses, it may be nice to define all methods within the instance for the leaf typeclass — instead of writing one instance per typeclass in the chain.

Design

In Dex, we could support inferred superclass instances. Consider the following typeclass hierarchy:

-- Semigroup → Monoid → Group → Abelian
-- Other hierarchies: ... → SemiRing → Ring.

-- Defines "combine" operation.
-- Laws:
-- - Associativity: x <> (y <> z) = (x <> y) <> z
interface Semigroup a
  (<>) : a -> a -> a

-- Defines unit operation.
-- Laws:
-- -  Left identity: munit <> x == x
-- - Right identity: x <> munit == x
interface [Semigroup a] Monoid a
  munit : a

-- Defines inverse operation.
-- Laws:
-- - x <> inverse x == munit
-- - inverse x <> x == munit
interface [Monoid a] Group a
  invert : a -> a

-- Defines commutativity law.
-- Laws:
-- - x <> y == y <> x
interface [Group a] Abelian a

The proposed feature is: we can directly provide an instance for Abelian Float like so, inferring superclass instances:

instance Abelian Float where
  munit = 0.
  (<>) = \x y. x + y
  invert = \x. -x

An extension is to allow explicit declarations of superclass instances if desired, so they can be found in the source code:

-- Tentative syntax: comma-separate explicit superclass instance declarations.
instance Semigroup Float, Monoid Float, Group Float, Abelian Float where
  munit = 0.
  (<>) = \x y. x + y
  invert = \x. -x

Alternatives considered

Don't do it: favor clarity in source code

One downside of inferred superclass instances is that the superclass instances no longer appear in the source code. With instance Abelian Float: we cannot grep for instance Semigroup Float in the source code.

This concern seems mitigated with proper tooling support:

cristianurlea commented 3 years ago

I strongly favour more verbose syntax and would discourage implicit behaviour. Hiding the inferred derivation by default could result in very subtle issues for the end-user. It's worth pointing out Deriving Via or, How to Turn Hand-Written Instances into an Anti-pattern as the closest Haskell equivalent.

duvenaud commented 3 years ago

Nicely written. Just so I'm sure I understand correctly, should your Abelian Float instance also have invert = \x. -x?

dan-zheng commented 3 years ago

Nice catch, thanks @duvenaud.

danieldjohnson commented 3 years ago

An interesting property of inferred superclass instances: it allows refining the class hierarchy without breaking user code. In particular you could lift operators from an interface into a new parent interface, and user definitions would remain valid (now defining two instances).

dougalm commented 1 year ago

This is a good idea and we should do it. Especially now that our class hierarchies are growing deeper, with the recent addition of the Data class.

Let's implement the first version Dan proposed, where the superclass names are implicit. I like Daniel's argument that it lets you refine the class hierarchy without breaking user code.

As for implementation, I think we can do it entirely within Inference.hs. We currently fail in checkInstanceBody if we come across a superclass dictionary we can't synthesize. Instead, we can just emit a DictHole. Then once we're back in inferTopUDecl, before synthInstanceDef, we can create instances for each superclasses we couldn't find, using the same set of methods and instance binders. Those synthetic instances might require their own superclasses so we'll do this recursively, taking care not to visit the same superclass twice through different paths.