Increase sharing of in-memory proof terms

UlfNorell commented 10 years ago

From pumpkingod@gmail.com on November 12, 2011 20:24:51

This is more of a hypothetical feature request, but I've heard that the reason Agda is so memory-hungry on some proofs is that it doesn't always share identical terms, in memory. I have no idea how much work it would be to help it do so a bit more, but I figured I'd put this up as a feature request in case someone else knew more.

Original issue: http://code.google.com/p/agda/issues/detail?id=514

UlfNorell commented 10 years ago

From nils.anders.danielsson on November 14, 2011 04:56:17

it doesn't always share identical terms

I'm not sure if what you suggest (some kind of hash-consing) is the best possible strategy. However, the term representation that is currently used actively destroys sharing (due to reindexing when we go under binders), and this seems really bad.

Another potentially important change would be to make the sharing explicit so that we could quickly tell if we're comparing something with itself. However, Ulf did a limited experiment a couple of years ago, and these comparisons didn't seem to be extremely common. More experiments might be a good idea.

Owner: ulf.nor...@gmail.com
Labels: Type-Enhancement

UlfNorell commented 10 years ago

From pumpkingod@gmail.com on November 14, 2011 07:53:25

I'm not necessarily suggesting hash-consing, but that would definitely help the situation. Are there any situations in which sharing is actually preserved? For example, is there any point to writing let x = ... in ... x ... to share the x term, or to use where blocks to increase sharing, or does that get discarded? If it does currently get discarded, it seems like it'd be fairly low-hanging fruit to allow sharing there, and then we might have a few more options when we're stuck with huge terms.

Also, when implicit arguments get solved behind our backs, does Agda regenerate whole terms rather than reusing existing ones?

For example, if I have f : {A : Set} -> List A -> A, and pass it a List , will A get a duplicate expression?

UlfNorell commented 10 years ago

From andreas....@gmail.com on June 05, 2012 08:41:48

Currently, let x = u in t is interpreted as t[u/x], no sharing is preserved here.

Introducing let into the Internal syntax amounts to having explicit substitutions. I think that would be a sensible change. We should think about how to do it.

Parallel vs. single substitution

On paper (e.g. LFMTP 2010) I have studied parallel explicit substitutions. Basically, this is the following:

Gamma |- sigma : Delta Delta |- t : A

Gamma |- let sigma in t : let sigma in A

sigma provides a binding for every free variable of t and A. The new values make sense in context Gamma.

There are rules for composing explicit substitutions, and there are special substitutions like "shift every index by n" in order to not having to spell out boring substitutions for every variable. Twelf and Coq both use this representation.

If you want to expose the head constructor of a term, you need to push the let binding into the subterms.

One could also think of single substitutions, to implement something like term graphs. However, I have no clue how to implement this cleverly with de Bruijn indices. It seems important to be able to gather several bindings into one...

Labels: Priority-High Performance

UlfNorell commented 10 years ago

From jmchap...@gmail.com on October 08, 2012 08:25:23

Status: Accepted

UlfNorell commented 10 years ago

From k...@cas.mcmaster.ca on November 03, 2012 20:32:07

I saw significant activity on sharing and explicit substitution during and shortly after the last AIM, but recently, I have seen no indications of further activity in this direction --- what is the current roadblock, and where could I look for doing something about it?

I have two processes running now for over seven weeks on a supercomputer, each with 256GB of heap, running with RTS +S, and currently showing (development version of Agda, GHC-7.6.1):

16456597872 1256931664 135504846752 2350.04 2589.03 6546932.59 4588943.43 0 0 (Gen: 1) 16452855992 1256931680 135532289360 2340.56 2585.66 6549295.20 4591551.14 0 0 (Gen: 1) 16449185768 1256931664 135554065584 2339.91 2650.41 6551657.28 4594225.57 89 0 (Gen: 1) 16447017800 1256936416 135578662808 2324.10 2616.94 6554003.50 4596867.40 76 0 (Gen: 1) 16443835976 1256931664 135604714424 2314.45 2646.42 6556340.22 4599536.70 31 0 (Gen: 1) 16439938808 1256931720 135621224160 2307.49 2681.24 6558669.66 4602243.17 0 0 (Gen: 1)

(332G VIRT and 274G RES)

and 5486306736 1036179616 154117288472 1376.18 1595.11 10238772.53 4579233.86 6 0 (Gen: 1) 8009731936 551226504 153632334488 1378.75 1624.84 10240161.98 4580870.36 292 170 (Gen: 1) 9526316920 1568003768 154649112016 1364.92 1559.05 10241542.00 4582446.51 171 0 (Gen: 1) 1874421176 1235496344 154867835168 1364.39 1566.30 10242909.57 4584017.10 341 0 (Gen: 1) 3611890352 236935320 154890949128 1364.92 1640.39 10244280.00 4585663.94 464 0 (Gen: 1)

(271G VIRT and 234 RES)

respectively, i.e., they make seconds of progress per hour of GC, and both are just putting a couple of field values imported from previously typechecked modules into a record... In both cases, the developments they build upon typecheck in at most a couple of hours, and in 11G of heap.

I would expect these to typecheck quite quickly with appropriate sharing...

UlfNorell commented 10 years ago

From nils.anders.danielsson on November 05, 2012 02:18:16

If you want to experiment with sharing you can modify some lines in Agda.Syntax.Internal: ignoreSharing, ignoreSharingType, shared and sharedType should be "activated".

If you activate these functions, then Agda is currently likely to use /more/ memory, because every pointer is wrapped in an extra constructor Shared.

At the moment we have the problem that sharing is lost when something is substituted under a lambda (due to shifting of de Bruijn indices). One could consider switching to a different representation of variables, or perhaps using full hash-consing in order to recover sharing.

I don't know too much about hash-consing, but it would be interesting to see how the use of hash-consing affects Agda's performance. If you want to experiment with this, then you could perhaps use the technique described in Conchon and Filliâtre's "Type-Safe Modular Hash-Consing".

UlfNorell commented 10 years ago

From k...@cas.mcmaster.ca on December 18, 2012 10:12:33

I just ``darcs pull''ed benchmark/categories/Language.agda, where the header comment says that it does not typecheck in 2.5G of heap; I can confirm that it does not typecheck with 10G of heap, either (with Agda-2.3.2).

UlfNorell / agda-test

Increase sharing of in-memory proof terms #514