goldfirere / singletons

Fake dependent types in Haskell using singletons
284 stars 36 forks source link

Knowing what is (and isn't) defunctionalized can be confusing #429

Open RyanGlScott opened 4 years ago

RyanGlScott commented 4 years ago

While pondering the questions in https://github.com/goldfirere/singletons/pull/427#discussion_r361532779, I realized that trying to document the existing status quo regarding what language constructs are OK to defunctionalize is much, much trickier than I thought. The main source of trickiness is that singletons has an inconsistent ruleset: whether something is defunctionalized or not depends on whether it is quoted (i.e., $(singletons [d| ... |])) or reified (i.e., $(genDefunSymbols [...])):

Quoted

Functions

Legal. Just define the function in quotes, e.g.,

$(singletons [d|
  foo :: Bool -> Bool
  foo = ...
  |])

And singletons will produce FooSym0 and FooSym1.

Classes

Legal, but only in the sense that quoted classes will produce defunctionalization symbols for the promoted class methods and associated type families, e.g.,

λ> ; $(singletons [d| class C a where type T a :: Bool; m :: a -> a |])
<interactive>:6:5-67: Splicing declarations
    singletons
      [d| class C_aejd a_aejg where
            type T_aeje a_aejg :: Bool
            m_aejf :: a_aejg -> a_aejg |]
  ======>
    class C_aejX a_aek0 where
      type T_aejY a_aek0 :: Bool
      m_aejZ :: a_aek0 -> a_aek0
    type TSym1 ...
    data TSym0 ...
    type MSym1 ...
    data MSym0 ...

Defunctionalization symbols for the class itself are not produced.

Class methods

Legal. See the "Classes" section.

Type families and synonyms

Legal. Quoted type families and synonyms are defunctionalized as you would expect. See also the "Classes" section for how associated type families are handled.

Data types

Legal, but only in the sense that quoted data types will produce defunctionalization symbols for the promoted record selectors and data constructors, e.g.,

λ> ; $(singletons [d| data D a = MkD { unD :: a } |])
<interactive>:8:5-49: Splicing declarations
    singletons [d| data D_aelz a_aelC = MkD_aelA {unD :: a_aelC} |]
  ======>
    data D_aema a_aemd = MkD_aemb {unD_aemc :: a_aemd}
    type UnDSym1 ...
    data UnDSym0 ...
    type MkDSym1 ...
    data MkDSym0 ...

Defunctionalization symbols for the data type itself are not produced.

Data constructors

Legal. See the "Data types" section.

Record selectors

Legal. See the "Data types" section.

Reified

Functions

Illegal. Attempting to use genDefunSymbols on a function name will throw an error:

λ> ; $(genDefunSymbols ['id])

<interactive>:9:5: error:
    Building defunctionalization symbols of values not supported

<interactive>:9:5: error: Q monad failure

Classes

Legal. Reified class names will get defunctionalization symbols, but its promoted class methods and associated type families will not be defunctionalized:

λ> class C a where type T a :: Bool; m :: a -> a
λ> ; $(genDefunSymbols [''C])
<interactive>:20:5-25: Splicing declarations
    genDefunSymbols [''C]
  ======>
    type CSym1 ...
    data CSym0 ...

Class methods

Illegal. Attempting to use genDefunSymbols on a class method name will throw an error:

λ> class C a where type T a :: Bool; m :: a -> a
λ> ; $(genDefunSymbols ['m])

<interactive>:22:5: error:
    Building defunctionalization symbols of values not supported

<interactive>:22:5: error: Q monad failure

Reifying the parent class name will not produce defunctionalization symbols for its promoted class methods either.

Type families and synonyms

Legal. Reified type families and synonyms are defunctionalized as you would expect. Associated type families are not automatically defunctionalized when the parent class name is reified, but one can just as well reify the associated type family name directly.

Data types

Legal, but only in the sense that reified data types will produce defunctionalization symbols for the data constructors, e.g.,

λ> data D a = MkD { unD :: a }
λ> ; $(genDefunSymbols [''D])
<interactive>:24:5-25: Splicing declarations
    genDefunSymbols [''D]
  ======>
    type MkDSym1 ...
    data MkDSym0 ...

Data constructors

Illegal. Attempting to use genDefunSymbols on a data constructor name will throw an error:

λ> data D a = MkD { unD :: a }
λ> ; $(genDefunSymbols ['MkD])

<interactive>:26:5: error:
    Building defunctionalization symbols of values not supported

<interactive>:26:5: error: Q monad failure

The only way to defunctionalize data constructor names through reification is to reify the parent data type.

Record selectors

Illegal. Attempting to use genDefunSymbols on a record selector name will throw an error:

λ> data D a = MkD { unD :: a }
λ> ; $(genDefunSymbols ['unD])

<interactive>:28:5: error:
    Building defunctionalization symbols of values not supported

<interactive>:28:5: error: Q monad failure

Reifying the parent data type name will not produce defunctionalization symbols for its promoted record selectors either.


There are multiple inconsistencies between quote-based defunctionalization and reification-based defunctionalization:

  1. Defunctionalizing the names of functions, class methods, data constructors, and record selectors is legal with quoting, but trying to do the same with straightforward reification is illegal.
  2. Quoted data types will produce defunctionalization symbols for its promoted record selectors, but reified data types will not.
  3. Defunctionalization for classes works in completely different ways depending on whether quoting or reification is used:
    • Quoted classes will not produce defunctionalization symbols for the class name itself, but will produce defunctionalization symbols for promoted class method names and associated type family names.
    • Reified classes will produce defunctionalization symbols for the class name itself, but will not produce defunctionalization symbols for promoted class method names and associated type family names.
RyanGlScott commented 4 years ago

Question: what should we do to resolve inconsistencies (1)-(3) above? Here are my thoughts:

  1. My initial gut feeling is that using genDefunSymbols on the name of a function, class method, data constructor, or record selector should produce defunctionalization symbols for the promoted equivalent of that name. For instance, genDefunSymbols ['id] should produce IdSym0 and IdSym1.

    There is one potential sticking point with this idea. Recall that this is the code that gets generated for IdSym1:

    type IdSym1 (x :: a) = Id x

    This code assumes that the promoted Id type family already exists. However, if Id doesn't exist, then this code will error out. Therefore, allowing genDefunSymbols to be invoked on id offers a new way to shoot yourself in the foot.

    It is worth noting, however, that there is already a footgun of a similar variety on the quoting side. This code will also error out if Id does not exist:

    $(singletonsOnly [d| type Id x = x |])
  2. If the answer to (1) is "allow it", then I would advocate for having genDefunSymbols [''D] (where D is the name of a data type) produce defunctionalization symbols for D's promoted record selectors. Again, this is a footgun if any of the promoted record selector type families do not yet exist, but it is exactly the same footgun as in (1).
  3. This is a tricky one. Personally, I find it quite strange that genDefunSymbols generates defunctionalization symbols for class names. If C is the name of a class, then one doesn't need to generate CSym0 at all, since you can just as well use TyCon C to achieve the same thing. But alas, someone specifically asked for this feature, so it might be a bit reckless to change this convention.

    Having just written this, I wonder if we should explore an alternative design where genDefunSymbols doesn't exhibit the "bundling" behavior that it currently has. For example, given data D a = MkD { unD :: a }, then genDefunSymbols [''D] won't just generate DSym0 et al.. It will also bundle along MkDSym0 and UnDSym0 et al. Perhaps genDefunSymbols [''D] should simply generate DSym0 et al. and nothing else. If you want MkDSym0 et al., you will have to write genDefunSymbols ['MkD] explicitly, and similarly for UnDSym0 et al.

    As far as classes go, this would mean that genDefunSymbols [''C] would continue to produce defunctionalization symbols. If C had any associated type families or methods, you would have to call genDefunSymbols on them separately.

    The downside to this approach is that this code:

    $(genDefunSymbols [''D, 'MkD, 'unD])

    And this code:

    $(singletons [d| data D = MkD { unD :: a } |])

    Would no longer do the same thing, as the former would produce defunctionalization symbols for D, but the latter would not. Perhaps this inconsistency is acceptable, however, since this hypothetical version of genDefunSymbols is explicitly designed to work at a more fine-grained level than quoting is.

goldfirere commented 4 years ago

Thanks for this thorough analysis. Here is what I propose as a way forward -- but my opinion here is fairly weak, and I am happy with many possible directions (including simply documenting the current strange setup, which I don't believe is actively hurting anyone).

These ideas broadly follow yours, but with the possibility that some find the bundling behavior useful.

RyanGlScott commented 4 years ago

I like your suggested design of having both genDefunSymbols and genAllDefunSymbols. I wonder if it's worth adding another variant that generates defunctionalization symbol names using the same approach as quoting does? After all, my plan for implementing this idea was to add an extra argument to D.S.Promote.Defun.defunctionalize of type GenSymbolsDepth, where:

data GenSymbolsDepth
  = GenNameOnlyDefunSymbols
  | GenNonDataOrClassDefunSymbols
  | GenAllDefunSymbols

With this approach, genDefunSymbols becomes (roughly) an alias for defunctionalize GenNameOnlyDefunSymbols and genAllDefunSymbols becomes an alias for defunctionalize GenAllDefunSymbols. I use GenNonDataOrClassDefunSymbols when invoking defunctionalize from functions that deal with quoted declarations (such as promote and singletons), but there is no genNonDataOrClassDefunSymbols alias for defunctionalize GenNonDataOrClassDefunSymbols. Should there be?

goldfirere commented 4 years ago

I don't feel the need for exposing this much flexibility, per se. Is anyone asking for it? But I'm also not actively against exposing it.

RyanGlScott commented 4 years ago

I don't feel the need for exposing this much flexibility, per se. Is anyone asking for it?

It's hard to answer that question since this new design is still hypothetical. But it is worth noting that anyone who relies on the current behavior of $(genDefunSymbols [''D]) (given data D = D_1 | ... D_n) won't have a clear migration path in the new design, since neither the new genDefunSymbols nor genAllDefunSymbols will do quite the same thing. A user could conceivably write out each individual constructor to be defunctionalized (i.e., write $(genDefunSymbols [''D_1, ..., ''D_n]), but I could foresee this being annoying if n is large.

Of course, it could be the case that my worrying is completely unfounded and that no one will miss the current behavior of genDefunSymbols. But I don't know how to gauge how users would react to this idea.

goldfirere commented 4 years ago

Given that there are about 20 usages of genDefunSymbols in all of hackage (not counting singletons), I'm not very worried about the migration story. Futhermore, generating all symbols is rarely troublesome. In other words, do whatever is easiest.

RyanGlScott commented 4 years ago

I briefly looked into implementing this, but it turned out to be much more work than I originally had expected. For now, I'll just document the various corner cases of genDefunSymbols in the README and leave this issue open as a reminder to implement the better design in https://github.com/goldfirere/singletons/issues/429#issuecomment-569340217 (or to just wait until UnsaturatedTypeFamilies is a thing and makes defunctionalization obsolete, I suppose). I've taken care of the documentation side of things in #441.