google-research / dex-lang

Research language for array processing in the Haskell/ML family
BSD 3-Clause "New" or "Revised" License
1.58k stars 107 forks source link

OOP envy and type-based name spaces #1242

Closed dougalm closed 1 year ago

dougalm commented 1 year ago

Simon Peyton Jones envies OOP and so do I.

Let's say I want to represent a Gaussian parameterized by its mean and variance. I want a type constructor, a data constructor and getters for each field. Here's what I have to do in Dex today:

data Gaussian = MkGaussian Float Float
def gaussian_mean ((MkGaussian x _):Gaussian) : Float = x
def gaussian_variance ((MkGaussian _ x):Gaussian) : Float = x

The boilerplate is annoying but what really bothers me is that I have to come up with four global names: Gaussian, MkGaussian, gaussian_mean, gaussian_variance. And then I need to remember those names every time I interact with a Gaussian object. In OOP land, you expect to be able to define the struct like this:

struct Gaussian = { mean : Float, variance : Float }

and use it like this:

my_gaussian = Gaussian.new(1.0, 4.0)
my_gaussian.mean + x * sqrt g.variance

Much better, right?

The constructor name (Gaussian.new) comes for free given the type name. We still need names for the fields, but they're now local to the Gaussian type. And since they're local they can be short. We don't care if mean and variance are already top-level names or if other objects use these names for their fields. When we create the type Gaussian we get a protected name space for its data and functions. I haven't mentioned functions until now, but OOP lets you bundle functions ("methods") with your object types too. I want that too:

def Gaussian.log_density (self:Gaussian) (x:Float) : Float = ...
my_gaussian.log_density(1.2)

I don't care about any of the other OOP stuff -- inheritance, subtyping, factories -- but I really want this, the "power of the dot".

Isn't this just records?

I don't think so. Dex's records don't create a distinct type unless you further wrap the record type with a newtype constructor. So that means (1) you get worse type errors because type inference gets to "look inside" the record structure and (2) you can't add custom instances. Even if we bundled record declaration with creating a new nominal type (like Haskell does), you still wouldn't be able to add type-namespaced OOP-style methods to that type.

Isn't this just type classes?

I don't think so. Yes, type classes do let you overload names, but they encourage you to do it in a very uniform way. If you're writing an instance for Eq Gaussian, you don't get to write an arbitrary function for (==). It has to be a function of type Gaussian -> Gaussian -> Bool, and it's expected to be associative, commutative, and satisfy (x == x) = True. The benefit of that restriction is that it means you can sensibly use (==) in a polymorphic context. You can write useful functions like has_duplicates :: Eq a => [a] -> Bool where the only thing you know about a is that it implements Eq a. But the cost is that everyone needs to agree on what (==) is supposed to mean. (==) is still a global name.

The dot-style overloading doesn't have any of that uniformity and coordination. Each type just gets its own name space to use however it likes. That does mean you can't use the dot-style overloading in a polymorphic context. Writing foo.bar requires a foo with a concrete type (or at least a concrete head). But there's a benefit to restricting it to concrete types: the rules for resolving the name bar can be anything we want and they don't have to be expressible as type class resolution.

Implementation proposal

Unfortunately the actual dot character . is currently used for indexing. I think we should retire it from that role so it can serve this one but that's a separate discussion. For the initial implementation we can use ~ or something instead of ., but I'll keep using . here for consistency.

We'll want to iterate on the specifics but I suggest we start with the following:

  1. Add concrete syntax <identifier>.<identifier> to UExpr. CoreIR/SimpIR doesn't change at all.
  2. Add an EMap TyConName SourceMap registry to the top-level environment that maps type constructor names to the names of the functions and data they refer to.
  3. Resolve x.y during type inference as follows. If x is the name of a type constructor, look up y in the source map associated with that type constructor and use the value directly. Otherwise try the same thing with the type of x and then apply the function we find (it has to be a function in this case) to x. Otherwise throw a type error. There are questions about the order in which we try to resolve types vs field names. We'll probably eventually want to interleave them like we do with dictionary synthesis.
  4. To populate the registry in the first place let's start by allowing def to use tycon-qualified names, like def Gaussian.log_density. Then we can start adding syntactic sugar like the struct syntax above. We might even want to go OOP-wild and allow a type and all its methods to be defined in one big block like a Python class.
darrenjw commented 1 year ago

I think that you are describing Scala case classes. ;-)

dougalm commented 1 year ago

I think that you are describing Scala case classes. ;-)

Yes, that looks pretty close actually. Thanks!

soraros commented 1 year ago

Lean uses namespace for this. For instance, constructor c of an inductive type T is automatically put into the namespace T.

dougalm commented 1 year ago

Lean uses namespace for this. For instance, constructor c of an inductive type T is automatically put into the namespace T.

Ooh, that's really slick. I like the open command.

axch commented 1 year ago

Is this done now? We don't have def Foo.bar syntax yet for re-entering a namespace, but as of #1261, I believe the spirit of this idea is fulfilled. Maybe refile the remaining task(s) as (a) separate issue(s)?

dougalm commented 1 year ago

Yes, I think this is done for now! My OOP envy has now faded to a very pale green.