google-research / dex-lang

Research language for array processing in the Haskell/ML family
BSD 3-Clause "New" or "Revised" License
1.58k stars 107 forks source link

Strip newtypes during simplification pass [WIP] #1188

Closed dougalm closed 1 year ago

dougalm commented 1 year ago

The basic approach seems to work but I ran into two big problems with this patch. It looks like it's going to be a long slog to work around them and get the tests passing. Instead, I'm going to fix them on main and then apply this patch on top of that.

Problem 1: printing

Printing tables from the Haskell runtime has always been a mess. We rely on our half-baked interpreter to figure out the index set sizes and we call into Imp to calculate offsets and interpret the low-level representation of data. But with this patch, calling back into Imp becomes harder because Imp doesn't handle the rich types that we want to be able to print. Getting the printer working as before will mean reimplementing a lot of the logic in Imp, including Algebra!

But there's a better solution: codegen printing and print from within the Dex runtime instead of Haskell. That also gives us debug-printing for free, which is a badly missing feature. Eventually we want to use Show or similar so that we can have user-defined formatting. But I propose that initially we use a built-in code generator that can print any Dex value (with fallback strings like <function> for some cases). I don't think it'll even be especially hard to implement. It's just writing the equivalent of the Pretty (Atom r n) instance using the Builder monad, on top of a couple of primitives like printStringLit :: String -> Builder CoreIR n () and printBaseTypeValue :: String -> Atom n -> Builder CoreIR n ().

Doing this will cause one regression: we'll have to print lambda terms like <function: Nat -> Nat> rather than \x:Nat. x. But that's how it works in most languages anyway. If you want to see the internals of a function for debugging, there's always %passes.

Printing was the main reason we've had to keep Interpreter and Serialize around. They've always been buggy and hard to maintain. Once we do printing from the Dex runtime we can delete them both! (This will regress one feature: we won't be able to have table literals with non-Fin index sets. While it's a nice feature to have, I don't think it's worth the cost of maintaining the interpreter. You can always achieve the same thing with a runtime-checked coercion.)

2. Problem 2: ProjectElt

We introduced ProjectElt to allow projections to appear in types. That lets us gives a type to the second component of a dependent pair without unpacking the whole pair into two binders at once. We made it n-ary and made its rhs a Var instead of an Atom in order to syntactically enforce normalization -- we don't want something unreduced like fst (1,2) to appear in a type. But there are two problems with ProjectElt as it is today:

  1. It's too inflexible to allow new language features. In the current patch, we needed a way to project SimpInCore atoms, but they're not variables so they can't appear in ProjectElt as it stands. So I had to give SimpInCore its own list of projections. Actually, it needed two of them! (One before the coercion and one after.) When we allow application in types, which we want for associated types, we'll have the same problem.

  2. Naive substitution can denormalize so we need to re-normalize when we substitute. But our current substitution framework doesn't carry the environment we need to query types. So we're forced to do a purely syntactic normalization. We've scraped by so far but with these latest features it's no longer possible. Our RepVal atom constructor that we use in Imp translation had to have its own list of projections because there wasn't enough information to unwrap the newtype.

Eventually I think we want to have a different ANF constraint for atoms in type position. ANF makes it easy to share work but hard to test equality. But for types we want the opposite: "Anti-ANF" -- no decls but arbitrarily nested expressions -- would forbid work-sharing but make equality checking easy.

Before we go that far I think we can manage if we (1) make the RHS of ProjectElt an Atom (and ProjectElt can then be unary) and (2) implement substitution-with-atoms in a way that lets us carry the environment. Purely-renaming substitution can still be done syntactically. So instead of the parametric SubstE v with v = Name and v = AtomSubstVal, we'll just have two classes,SubstE and AtomSubstE, where only the latter requires an environment.

Other small changes that will make this patch easier to land

axch commented 1 year ago

Is it worth code-genning code that writes to an in-memory string buffer rather than literally prints? Pro: more flexible on the user side. Con: Presumably requires O(1) vector push. Or would it be easy to refactor to that after?

Changing away from AtomSubstVal will presumably be pretty big, since those substitutions are used all over place, aren't they? There are presumably also other substitution types, like the one in the inliner (which carries expressions, that either have or have not been inlined into themselves). And I want to introduce a "suspended lambda" constructor to substitutions that can be carried by the simplifier, to make its performance not be quadratic. We should make sure the new plan is generic enough to allow all of those use-cases.

+1 to the small changes. Though I've also been reading about the uniqueness type system in Futhark, and thinking that maybe we want to borrow from there. Such borrowing may have interesting consequences for references and types thereof.

dougalm commented 1 year ago

Is it worth code-genning code that writes to an in-memory string buffer rather than literally prints?

Yes, I agree that would be better but we need the O(1) stack as you say. I'm planning to directly write for now but I think it'll be easy enough to change the implementation underneath once we have a stack.

Changing away from AtomSubstVal will presumably be pretty big, since those substitutions are used all over place, aren't they?

Yes. It's a big change, but a very mechanical one at least. Good point about making sure that we keep enough flexibility for these other use cases.

dougalm commented 1 year ago

Superseded by #1209