"Tracing" normalisation?

ocharles commented 5 years ago

I'm currently exploring using Dhall for a new pricing service at CircuitHub. My rough plan is that our latest pricing model is a Dhall expression with access to a common prelude of functions that do things like pull resources of a PostgreSQL database, but the actual pricing logic is consolidated in the Dhall file. A huge win from this is that we can theoretically produce an "evidence log" as to how a price was derived - simply output the original Dhall expression, and then the new Dhall Expr after each reduction step. I'm not sure if dhall will actually let me write that though. Is this something we could add?

ocharles commented 5 years ago

One could imagine normalize : Expr -> Normalisation where

data Normalisation = VarNF Expr | ... | IfTrue Expr Normalisation | ...

In this form, Normalisation is an ADT where each constructor is each a beta reduction rule. Each constructor takes the input Expr and may produce a new Normalisation computation (e.g., IfTrue produces a new Normalisation which is the continuation of normalisation after applying the if True then x else y rule). Values already in normal form have no corresponding Normalisation (e.g., VarNF).

This is just a suggestion, I don't necessarily need this, approach. From my requirements, having the entire new Expr after each reduction step would be sufficient (it's just for logging, and I can use Dhall.Diff to provide a UI for people to step through evaluation).

ari-becker commented 5 years ago

You can't just split up the expression into multiple expressions which are evaluated independently? After the evaluation of each expression, drop into the calling context, report the numbers, proceed to evaluate the next expression (i.e. the next level of normalization), repeat as necessary. Possibly some kind of expression which exposes all of the intermediate steps:

let step1 = expr_1
let step2 = function_takes_step1_as_param step1
let final = final_function step2
in
{ step1 = step1
, step2 = step2
, final = final
}

I've explored the use of Dhall as a foundation for a pricing tool and the main gotcha I've run into at this time is the lack of arithmetic for Doubles, which forces arithmetic to be calculated outside the Dhall context.

ocharles commented 5 years ago

You can't just split up the expression into multiple expressions which are evaluated independently?

A crucial requirement of my work is I shouldn't have to think about what is logged and what isn't. So anything that requires me to choose where the cuts are fails to satisfy this requirement. I really do think "reduction steps" is the right LoD, though maybe filtered to just variable substitution and the result of function application.

I've explored the use of Dhall as a foundation for a pricing tool and the main gotcha I've run into at this time is the lack of arithmetic for Doubles, which forces arithmetic to be calculated outside the Dhall context.

I'm fine postulating a Money type and some known functions to transform it.

f-f commented 5 years ago

@ocharles how related is this with https://github.com/dhall-lang/dhall-lang/issues/626?

ocharles commented 5 years ago

@f-f Yea, I guess it's somewhat related - that issue might go a bit further than me, in that someone could highlight a subexpression in the output and see it's derivation tree or something. I'm happy to go "forwards" only, while that one talks about going backwards

Gabriella439 commented 5 years ago

This is probably going to be a bit of work for us to maintain. We're barely able to maintain something as simple as isNormalized and keep it in sync with the current normalizer. We are also currently maintaining normalizeWithM and normalize which are two entirely different normalizers.

This also doesn't bring us closer to addressing our main use case (displacing YAML)

ocharles commented 5 years ago

Why does it have to be something different to maintain? Can't normalize = fix normalizeSingleStep? Or is this too a naive view of normalisation?

Edit: also, isNormalized e = e == normalizeSingleStep e, or something, right?

AndrasKovacs commented 5 years ago

@ocharles I believe that tracing normalization is something which has to be implemented specifically to support tracing, and is necessarily slower than non-tracing ones. Concretely for eval, it relies heavily on GHC thunks and non-observable sharing, and to implement tracing on the top of it seems pretty hard, probably not easier than tracing evaluation for arbitrary GHC heap objects. To trace it without getting unreadable deluge of unfolded closures, we would need observable sharing and explicit thunking in a graph structure, which is a lot slower than native GHC thunks.

Tracing for naive substitution is much easier, but of course that's not "production strength" performance. So I think we need at least two normalizers if we want tracing.

Gabriella439 commented 5 years ago

I think what @ocharles is getting at is that we've already commited to supporting at least two normalizers (i.e. isNormalized and normalize), and if we implemented normalizeSingleStep like he suggested then isNormalized could be trivially derived from that so that we still have the same number of normalizers after the proposed the change (assuming that normalizeSingleStep is not much more complicated to implement than the current isNormalized function)

ocharles commented 5 years ago

@AndrasKovacs Glad I got your attention! I was curious how this relates to your new normalisation strategy. It seems easy enough to add this into the "old" normalizer - whenever you recurse, simply tell the thing you're recursing into, embedded into your calling context. E.g., if you're normalising BoolAnd x y, you would recurse into normalize x, and then wrap anything that normalizer logs with a call to \e -> BoolAnd e y to produce x', and then when you recurse into y you would wrap anything that logs with \e -> BoolAnd x' e. I have a little proof of concept of this that seems to work.

However, it seems that with the production grade normalizer, this approach is not viable. I guess this is related to what you said about "not rewriting machine code" - you can't easily reflect back into the original syntax tree?

ocharles commented 5 years ago

@Gabriel439 I wasn't really commenting on how many normalization strategies we currently have, more that we could take the strategy we already have and "unpick" the recursive call. Then I can simply use a richer fixed point type function that logs every step it takes.

AndrasKovacs commented 5 years ago

@ocharles in your simple traced normalizer, it is a sensible choice to only trace the state of a single expression. eval works more like a normal functional program, with state consisting of a reference to immutable code, environment (of values in scope), heap (with graph structure, not just tree structure) and stack. Tracing here would be similar to tracing or stepping through regular functional programs. I don't know much about implementing these. Heap and stack is implicit in my implementation, using the GHC implementations, but for tracing it would probably need to be explicit, unless GHC debugging can be feasibly reused, but I doubt that.

ocharles commented 5 years ago

Thanks, maybe I should take a super simple STLC or something and try and write a normaliser in the eval style to understand this better.

f-f commented 5 years ago

@ocharles if I understand correctly the issue here is that when you transform the AST from "source syntax" to "core representation" in NbE you translate the Lam constructor to an actual Haskell lambda, and tracing those would require either:

actually tracing GHC objects (which is probably hard)
or converting back to "raw syntax" - "quoting", which normally happens only once at the end of the evaluation - at every step (which is expensive)

For a simple thing to look at you can read Andras' barebones implementation of NbE, and in particular here is where the conversion to lambdas happens

ocharles commented 5 years ago

@f-f yep, that sounds right to me. It would be slower, but I was hoping there would still be a "single step" normalizer, even if it uses NbE, and I could quote the result of each step into a Writer or something.

MonoidMusician commented 5 years ago

I might actually be able to do this in dhall-purescript. I have abstracted the algorithm for normalization a bit, so it should be more configurable. I just don't have a pretty-printer yet, so you can't exactly view the resulting expression 😂 🙃.

How would you want it to be incrementalized? Would you like to see one complete pass over the tree with one "layer" normalized, and then print that, and then do another pass (if it isn't done)? I think I could do that fairly easily – I just configure it so it does not recurse. If you want literally every intermediate normalization printed, that would probably also be doable but more work … in particular, I'm not sure I really know what that would mean, in terms of tree-traversal semantics. I suppose I could make the recursion stop applying after one normalization step succeeds …

ocharles commented 5 years ago

If you want literally every intermediate normalization printed

I want this. The reason is I may have some complex expression that contains

...
   if foo > 5 then x else y

and I would want to see the reduction

  if 6 > 5 then x else y
  if True then x else y
  x

sjakobi commented 5 years ago

I think this would be a very attractive feature!

I think this could make Dhall an even more popular choice for the kind of problems where I believe it is already a pretty good candidate: Large, complex systems, where, I imagine, debuggability would be very valuable too.

Regarding the maintenance cost of supporting multiple normalizers – it's certainly there, but I don't perceive it to be a big problem. The property tests really help keeping everything in sync, and I'm trying to make them catch even more issues.

I also wonder whether the tracing normalizer could be built to subsume normalizeWithM. That should certainly keep the maintenance cost at an acceptable level.

Gabriella439 commented 5 years ago

@sjakobi: Yeah, I think that this is actually not as much maintenance as I thought. The only change I would suggest is that rather than replace normalizeWithM this should replace isNormalized. As @ocharles noted above, you can define isNormalized in terms of this functionality as:

isNormalized e = e == normalizeSingleStep e

sjakobi commented 5 years ago

Here's an attempt at what I suspect is the easiest approach that could work:

-- | Perform a single normalization step or return 'Nothing'
step :: Expr s a -> Maybe (Expr s a)
step = \case
    NaturalPlus x y ->
            (NaturalPlus <$> step x <*> pure y)
        <|> (NaturalPlus <$> pure x <*> step y)
        <|> decide x y
      where
        decide (NaturalLit 0)  r             = pure r
        decide  l             (NaturalLit 0) = pure l
        decide (NaturalLit m) (NaturalLit n) = pure (NaturalLit (m + n))
        decide  _              _             = empty
    ListLit Nothing xs ->
        ListLit Nothing <$> xs'
      where
        xs' = asum (Data.Sequence.mapWithIndex stepIx xs)
        stepIx ix x = fmap (\x' -> Data.Sequence.update ix x' xs) (step x)
    ListLit (Just t) Data.Sequence.Empty ->
        ListLit <$> fmap Just (step t) <*> pure Data.Sequence.Empty
    ListLit (Just _) xs ->
        pure (ListLit Nothing xs)
    _ -> empty -- TODO

To get a full trace, you'd use steps:

steps :: Expr s a -> [Expr s a]
steps e = case step e of
    Nothing -> []
    Just e' -> e' : steps e'

In action:

λ> Right e = exprFromText "" "[1 + (2 + 3), 4 + 5]"
λ> denote e
ListLit Nothing 
    ( fromList 
        [ NaturalPlus ( NaturalLit 1 ) 
            ( NaturalPlus ( NaturalLit 2 ) ( NaturalLit 3 ) )
        , NaturalPlus ( NaturalLit 4 ) ( NaturalLit 5 )
        ] 
    )
λ> steps (denote e)
[ ListLit Nothing 
    ( fromList 
        [ NaturalPlus ( NaturalLit 1 ) ( NaturalLit 5 )
        , NaturalPlus ( NaturalLit 4 ) ( NaturalLit 5 )
        ] 
    )
, ListLit Nothing 
    ( fromList 
        [ NaturalLit 6
        , NaturalPlus ( NaturalLit 4 ) ( NaturalLit 5 )
        ] 
    )
, ListLit Nothing 
    ( fromList 
        [ NaturalLit 6
        , NaturalLit 9
        ] 
    )
]

The problem with this approach is that at each step, we have to traverse the AST to find the next possible simplification. If performance matters, this will probably be too slow.

A more efficient approach would be to record all the steps for a subexpression before addressing the next one:

trace :: Expr s a -> [Expr s a]
trace = \case
    NaturalPlus x y ->
             (NaturalPlus <$> xs <*> pure y)
         <|> (NaturalPlus <$> pure x' <*> ys)
         <|> decide x' y'
      where
        y' = lastDef y ys
        ys = trace y
        x' = lastDef x xs
        xs = trace x

        decide (NaturalLit 0)  r             = pure r
        decide  l             (NaturalLit 0) = pure l
        decide (NaturalLit m) (NaturalLit n) = pure (NaturalLit (m + n))
        decide  _              _             = empty
    _ -> empty -- TODO
  where
    lastDef x [] = x
    lastDef _ xs = last xs

A limitation of both approaches above is that at each step we record only the full expression, so you can't easily see what changed from step to step. It would be nice if we could represent the surrounding "context" of an expression on its own, to enable better visualizations. I wonder whether we could add a new constructor Focus to Expr to mark the subexpression that is being normalized. I'll give this a try.

dhall-lang / dhall-haskell

"Tracing" normalisation? #1117