Open jwiegley opened 4 years ago
I've taken a little look at this, and discovered that Agda is generating a 162093 line long Haskell file when it compiles Categories.Category.Cartesian
.
There's a few identifiers that get compiled to over 10000 lines of Haskell each:
Categories.Category.Cartesian.Cartesian.monoidal
is 24015 linesand all of
Categories.Category.Cartesian.Cartesian.monoidal.pentagon
Categories.Category.Cartesian.Cartesian._.Symmetric.pentagon
Categories.Category.Cartesian.Cartesian._.Symmetric.braided.pentagon
Categories.Category.Cartesian.Cartesian.symmetric.pentagon
Categories.Category.Cartesian.Cartesian.symmetric.braided.pentagon
are 19669 lines. These six identifiers account for over three quarters of the compiled Haskell!
Categories.Category.Cartesian
isn't the only problem module. Here's all the modules in current master that compile to more than 100000 lines of Haskell
1539839 src/MAlonzo/Code/Categories/Theory/Lawvere.hs
954948 src/MAlonzo/Code/Categories/Category/CartesianClosed.hs
856929 src/MAlonzo/Code/Categories/Category/Instance/Properties/Posets.hs
693640 src/MAlonzo/Code/Categories/Category/Instance/Properties/Cats.hs
523664 src/MAlonzo/Code/Categories/Category/CartesianClosed/Canonical.hs
398934 src/MAlonzo/Code/Categories/Category/Cocomplete/Finitely.hs
382421 src/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Pentagon.hs
374535 src/MAlonzo/Code/Categories/Category/Cocartesian.hs
338521 src/MAlonzo/Code/Categories/Category/Complete/Finitely.hs
222713 src/MAlonzo/Code/Categories/Functor/Monoidal/Properties.hs
176396 src/MAlonzo/Code/Categories/Category/Construction/Properties/Presheaves/Cartesian.hs
174522 src/MAlonzo/Code/Categories/Category/Monoidal/Instance/Setoids.hs
172860 src/MAlonzo/Code/Categories/Category/Cartesian/Structure.hs
165838 src/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Diagonal.hs
162093 src/MAlonzo/Code/Categories/Category/Cartesian.hs
147289 src/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/L.hs
125701 src/MAlonzo/Code/Categories/Bicategory/Bigroupoid.hs
111602 src/MAlonzo/Code/Categories/Functor/Cartesian/Properties.hs
Notably, Categories.Theory.Lawvere
is 68 lines long but compiles to over a million and a half lines of Haskell!
Just curious: what made you wanna compile the library?
I don't know what causes this code explosion, whether that should be considered a bug and, if so, whether it's an Agda bug or a bug of the library. In any case, I'm not too surprised that some of the code behaves strangely when compiled to Haskell. For one thing, it's not really designed to be run. The particular file you mention, which mechanizes CCCs actually represents CCCs in a very inefficient way. The representation for conceptual reasons: it closely models a very general definition of exponentials that can be found in the literature. The module also contains big proof terms of e.g. equations, which are probably even less 'optimal'. (Some of the definitions that you noted above as producing very big Haskell terms are such proof terms.) I'm not saying that the inefficient representation is the source of the code explosion, it's likely something else, only that this code isn't really supposed to be compiled (IMO).
If one wanted to use CCCs to structure actual code, one would use e.g. the alternative but equivalent 'canonical' representation of CCCs (from the CartesianClosed/Canonical
module), and without the equivalence proofs.
All that being said, I'm also curious what exactly causes the explosion. Maybe it's the syntactic sugar in the pentagon diagrams that's somehow expanded badly or there's something that triggers normalization of terms involving records. Hard to know. Probably some work is needed to find a minimal Agda example that reproduces the issue.
I would not be surprised if there were another issue also compounding this: loss of sharing. We use Agda's modules a lot, and they are notorious for causing loss of sharing. So there might be huge amounts of duplication - and this gets worst when you are at the "top of the heap", i.e. CCC. (What's going on with Lawvere Theories is quite mysterious though).
I wonder how much of this is related to proof relevance? Beyond the sizes of these terms, knowing just a bit more about what's really in them would be interesting. Also, this issue is more likely an Agda issue (though if we knew what triggered it and it is possible to avoid library-side, I'd like to know). So it would make sense to file something there too.
I've taken a little look at this, and discovered that Agda is generating a 162093 line long Haskell file when it compiles
Categories.Category.Cartesian
.There's a few identifiers that get compiled to over 10000 lines of Haskell each: (...)
Categories.Category.Cartesian.Cartesian.monoidal.pentagon
(...) are 19669 lines. These six identifiers account for over three quarters of the compiled Haskell!
Interestingly, these all seem to be instances of the same pentagon proof, the one at Cartesian.agda, L346
The only other non-trivial instances of such proofs I could find were those in Categories.Bicategory.Instance.EnrichedCats
and Categories.Bicategory.Instance.Cats
. Do these by any chance result in huge Haskell files as well?
Categories.Bicategory.Instance.EnrichedCats.EnrichedCats
compiles to something over 10000 lines long. Categoires.Bicategory.Instance.Cats.Cats
compiles to over 40000 lines.
I can appreciate the answer being "don't do that". :) I more wondered because one objective of my Coq version of category theory is that it be reasonably computable, for use as an intermediate language in "transpiling" from lambda calculus to other categories by way of BiCCCs (per Conal's work). From what I've seen, the agda-categories library is quite attractive, so I was evaluating to see whether my work should shift here. Not having an efficient computable representation concerns me a little bit, as I then wonder if it will make it difficult to base a compilation toolchain on categories using this approach.
Thanks for elaborating on that @jwiegley!
Like John, I have also been intending to use agda-categories as a practical platform for computation, replacing and improving on the Haskell version of compiling-to-categories. I didn't realize that the library is "not really designed to be run".
Something is definitely up with the pentagon
equations.
Sorting by file size, here are the top 10 generated Haskell files that contain the string "pentagon":
115504 build/MAlonzo/Code/Categories/Theory/Lawvere.hs
81992 build/MAlonzo/Code/Categories/Category/CartesianClosed.hs
81984 build/MAlonzo/Code/Categories/Category/Instance/Properties/Posets.hs
55688 build/MAlonzo/Code/Categories/Category/Instance/Properties/Cats.hs
38992 build/MAlonzo/Code/Categories/Category/CartesianClosed/Canonical.hs
33376 build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Pentagon.hs
29376 build/MAlonzo/Code/Categories/Category/Cocomplete/Finitely.hs
27216 build/MAlonzo/Code/Categories/Category/Cocartesian.hs
24888 build/MAlonzo/Code/Categories/Category/Complete/Finitely.hs
13928 build/MAlonzo/Code/Categories/Category/Construction/Properties/Presheaves/Cartesian.hs
And here are the top 10 generated Haskell files not containing that string:
16168 build/MAlonzo/Code/Categories/Functor/Monoidal/Properties.hs
12488 build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Diagonal.hs
11088 build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/L.hs
9320 build/MAlonzo/Code/Categories/Functor/Cartesian/Properties.hs
6696 build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Dinatural.hs
5184 build/MAlonzo/Code/Categories/Bicategory/Instance/Cats.hs
5144 build/MAlonzo/Code/Categories/Category/Construction/Grothendieck.hs
4680 build/MAlonzo/Code/Categories/Category/Monoidal/Properties.hs
4232 build/MAlonzo/Code/Categories/CoYoneda.hs
4064 build/MAlonzo/Code/Categories/Yoneda/Continuous.hs
All sizes are in kilobytes.
I'm not sure what causes this yet though.
I will point my finger at proof relevance. another reason could be code extraction in Agda is not good at dealing with some particular Agda feature which we use extensively.
Is it possible to see what's in the generated haskell code to know what is taking so many lines?
technically, every line of code in this library can be run, due to proof relevance. it is a question, however, whether running it is efficient. many constructions here are relatively straightforward, but it is very difficult to evaluate that for proofs.
Is it possible to see what's in the generated haskell code to know what is taking so many lines?
@HuStmpHrrr agda -c --no-main --ghc-dont-call-ghc src/Categories/Theory/Lawvere.agda
will put the generated Haskell in src/MAlonzo/Code/Categories/Theory/Lawvere.hs
, which you can inspect in your text editor of choice, although I certainly wouldn't call it "readable".
This was also discussed in the agda-categories Slack. [Though I think we should, like for Agda itself, move to Zulip.]
As another data point, Conal has a definition of biproduct categories here, which is 71 lines long:
https://github.com/conal/agda-cat-linear/blob/master/src/Biproduct.agda
It generates a Haskell file that is 1,480,724 lines long, and generates a total of 2,143,057 lines of Haskell in the MAlonzo source directory. It takes 3.7 hours on my iMac Pro for this Haskell code to compile.
Wow. I'm pretty sure that Cartesian is partially at fault here.
There are some pretty significant blowups in the sizes of .agdai files too (which I worked on reducing some months ago, it was much worse). It might be related.
It might has something to do with patterns of nested modules, things like
field
T : CartesianCategory o ℓ e
module T = CartesianCategory T
I played around a bit more with the declaration of the pentagon
field of the Monoidal
record in Categories.Category.Monoidal.Core. None of it made any difference. In particular, I tried
Commutation
module (using explicit compositions instead),but none of this changed the size of the file build/MAlonzo/Code/Categories/Category/CartesianClosed.hs
one byte.
This suggests that the issue isn't with the declaration (i.e. the type signature) but with the proof of the pentagon identity in Categories.Category.Cartesian
https://github.com/agda/agda-categories/blob/7672b7a3185ae77467cc30e05dbe50b36ff2af8a/src/Categories/Category/Cartesian.agda#L346-L372
It might has something to do with patterns of nested modules, things like
field T : CartesianCategory o ℓ e module T = CartesianCategory T
I also suspect that's part of the problem. At the very least it explains why that one big pentagon proof leads to code explosions everywhere. I suspect it's because it is re-exported over and over again, and the re-exports basically create copies of the term. For example, there's a local module definition in the Cartesian
module right after the proof:
https://github.com/agda/agda-categories/blob/7672b7a3185ae77467cc30e05dbe50b36ff2af8a/src/Categories/Category/Cartesian.agda#L375
which is then re-exported e.g. in Categories.Category.CartesianClosed and in Categories.Category.Monoidal.Instance.Cats, and from there it spreads to Bicategories (via enrichment), etc. This might also explain why some of the proofs there take such a long time to complete.
Case in point: The following lines from Categories.Category.CartesianClosed.Canonical
re-export the contents of Cartesian
, which indirectly re-exports the nested monoidal
module contained in Cartesian
and with it the pentagon
proof.
https://github.com/agda/agda-categories/blob/7672b7a3185ae77467cc30e05dbe50b36ff2af8a/src/Categories/Category/CartesianClosed/Canonical.agda#L82-L84
This results in a copy of the pentagon proof in the generated Canonical.hs
.
Only when I explicitly hide the proof and all the local modules in Cartesian
that re-export the proof does the corresponding term disappear from the Haskell file.
open Cartesian isCartesian public
hiding (_×_; π₁; π₂; ⟨_,_⟩; monoidal; braided; symmetric; pentagon)
renaming (⟨⟩-cong₂ to ⟨,⟩-resp-≈)
EDIT: I checked again and the following turned out to be wrong. The private module application did not cause the pentagon proof to be included in the Haskell file. It seems I missed to hide one of the other public (but indirect) re-exports.
This is maybe not too surprising because this
https://github.com/agda/agda-categories/blob/7672b7a3185ae77467cc30e05dbe50b36ff2af8a/src/Categories/Category/CartesianClosed/Canonical.agda#L159
(Removed rest of this comment about the private module application of open
statement re-exports the contents of Cartesian
, but apparently it's sufficient to simply open an applied instance of CartesianClosed
for the same to happen. Consider the following line from the same file, which result in further copies of the pentagon proof in the generated Canonical.hs
:CartesianClosed
)
I'd love to know how module applications and re-exports are actually implemented. Does anybody know if this is documented somewhere? If not, maybe we should ask one of the devs on Zulip or so.
In the meantime, a fix would be to move the nested module definitions of monoidal
and symmetric
out of the Cartesian
record, making them into separate top-level definitions that can be imported only when necessary.
Of course, none of this explains why the pentagon
proof gets so enormous when compiled to Haskell. I'll investigate that further once I find some more time.
EDIT: Private module applications do not seem the culprit (see edited comment above).
The bit about module applications really worries me. If these are costly irrespective of whether they are opened publicly or privately, then the library is basically doomed to suffer from code explosion.
After double-checking, the private module applications turned out not to cause code bloat. Only the public ones (which makes more sense). Sorry for the confusion.
Still, it is somewhat strange to me that public re-exports actually result in a copy of the proof term (which seems to cause the code bloat) and not just in a call to the definition in the original module.
Is this a bug in Agda? Should we file a bug report? If anyone has insights about this, I'd be curious to know the rationale.
I have refactored the Cartesian
module/record so that the monoidal
field is now part of a separate module, and all monoidal properties are re-exported in that module instead (see PR #218).
This seems to help. There are fewer modules that contain the term pentagon
now, and those that still do are smaller. The updated tallies (in kilobytes) are as follows.
Top 10 largest Haskell files containing "pentagon":
51264 build/MAlonzo/Code/Categories/Category/CartesianClosed.hs
32832 build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Pentagon.hs
12352 build/MAlonzo/Code/Categories/Category/Cartesian.hs
10304 build/MAlonzo/Code/Categories/Bicategory/Bigroupoid.hs
4160 build/MAlonzo/Code/Categories/Category/Monoidal/Instance/Setoids.hs
2112 build/MAlonzo/Code/Categories/Enriched/NaturalTransformation/NaturalIsomorphism.hs
384 build/MAlonzo/Code/Categories/Category/Monoidal/Closed.hs
256 build/MAlonzo/Code/Categories/Category/RigCategory.hs
192 build/MAlonzo/Code/Categories/Category/Monoidal/Traced.hs
128 build/MAlonzo/Code/Categories/Category/Monoidal/Utilities.hs
Top 10 largest Haskell files not containing "pentagon":
16448 build/MAlonzo/Code/Categories/Functor/Monoidal/Properties.hs
12352 build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Diagonal.hs
11328 build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/L.hs
9280 build/MAlonzo/Code/Categories/Functor/Cartesian/Properties.hs
6208 build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Dinatural.hs
5184 build/MAlonzo/Code/Categories/Category/Construction/Grothendieck.hs
4160 build/MAlonzo/Code/Categories/Functor/Slice.hs
3136 build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Identity.hs
2112 build/MAlonzo/Code/Categories/Category/Equivalence/Properties.hs
1088 build/MAlonzo/Code/Categories/Category/Instance/Properties/Cats.hs
Clearly, some of these are still ridiculously big (there shouldn't be any ~10 MB sized source files) but at least the situation has improved.
Hi @sstucki, were you able to find out if this is an Agda bug?
Unfortunately not.
I wrote up a few small snippets with nested modules, module applications, etc. but I could never reproduce the odd behavior seen here, namely that a term is basically copied instead of just referenced when it is re-exported.
Without a minimal example, it's really hard to tell whether it's a bug, even for the Agda devs (I suspect). But I might be wrong, so feel free to file an issue in the Agda repo yourself.
As far as I know (from comments that both @UlfNorell and @WolframKahl made), 1) module parameters are instantiated by copying, and 2) private modules are also done by copying, with no real attempt to maintain sharing. That can definitely cause blowups. Whether that's what's going on here, I can't tell.
https://github.com/agda/agda2hs is probably a rescue.
agda2hs
is something completely different. The goal of that project is to let you write "Haskell" code in Agda and prove things about it, not compiling arbitrary Agda code to Haskell.
Weird progress: by taking the pentagon
proof out of being defined inside the record, but instead to a private variable outside of it, this cuts the resulting Haskell file from 12M to 5M. Still huge, but a non-trivial gain.
Even more progress: by changing the Cartesian
module to not create a module with all the BinaryProducts
stuff and then exporting it publicly as well, this goes from 5M to a little less than 2M. I'll push that soon.
The "winners" in the insanely large compiled files (as documented in the 2nd issue on the agda issues) are now:
MacBook-Air-2:Code carette$ find . -xdev -type f -size +10M -print | xargs ls -lh | sort -k5,5 -h -r
-rw-r--r-- 1 carette staff 222M 1 Aug 09:28 ./Categories/Bicategory/Construction/Spans.hs
-rw-r--r-- 1 carette staff 60M 1 Aug 09:22 ./Categories/Category/Monoidal/Interchange.hs
-rw-r--r-- 1 carette staff 47M 1 Aug 09:18 ./Categories/Category/CartesianClosed.hs
-rw-r--r-- 1 carette staff 30M 1 Aug 09:20 ./Categories/Category/Monoidal/Closed/IsClosed/Pentagon.hs
-rw-r--r-- 1 carette staff 15M 1 Aug 09:21 ./Categories/Functor/Monoidal/Properties.hs
-rw-r--r-- 1 carette staff 13M 1 Aug 09:24 ./Categories/Adjoint/Monadic/Properties.hs
-rw-r--r-- 1 carette staff 12M 1 Aug 09:20 ./Categories/Category/Monoidal/Closed/IsClosed/Diagonal.hs
So CartesianClosed
has actually gone down a tiny bit (it used to be 51M).
Interestingly, if one continues below 10M, it doesn't explode, i.e. the number of cases is not bad:
-rw-r--r-- 1 carette staff 9.9M 1 Aug 09:19 ./Categories/Category/Monoidal/Closed/IsClosed/L.hs
-rw-r--r-- 1 carette staff 9.3M 1 Aug 09:29 ./Categories/Pseudofunctor/Composition.hs
-rw-r--r-- 1 carette staff 8.4M 1 Aug 09:23 ./Categories/Functor/Cartesian/Properties.hs
-rw-r--r-- 1 carette staff 6.5M 1 Aug 09:30 ./Categories/Bicategory/Bigroupoid.hs
-rw-r--r-- 1 carette staff 5.5M 1 Aug 09:23 ./Categories/Category/Monoidal/Closed/IsClosed/Dinatural.hs
-rw-r--r-- 1 carette staff 4.2M 1 Aug 09:29 ./Categories/Category/Construction/Grothendieck.hs
-rw-r--r-- 1 carette staff 4.0M 1 Aug 09:22 ./Categories/Functor/Slice.hs
-rw-r--r-- 1 carette staff 3.5M 1 Aug 09:21 ./Categories/Bicategory/Instance/Cats.hs
-rw-r--r-- 1 carette staff 3.3M 1 Aug 09:19 ./Categories/Category/Monoidal/Closed/IsClosed/Identity.hs
-rw-r--r-- 1 carette staff 3.2M 1 Aug 09:23 ./Categories/Category/Construction/Adjoints.hs
-rw-r--r-- 1 carette staff 3.1M 1 Aug 09:19 ./Categories/Yoneda/Continuous.hs
-rw-r--r-- 1 carette staff 3.0M 1 Aug 09:18 ./Categories/Category/Monoidal/Properties.hs
-rw-r--r-- 1 carette staff 2.7M 1 Aug 09:29 ./Categories/CoYoneda.hs
-rw-r--r-- 1 carette staff 2.7M 1 Aug 09:28 ./Categories/Enriched/NaturalTransformation.hs
-rw-r--r-- 1 carette staff 2.5M 1 Aug 09:21 ./Categories/NaturalTransformation/NaturalIsomorphism/Monoidal.hs
-rw-r--r-- 1 carette staff 2.5M 1 Aug 09:17 ./Categories/Category/Cocartesian.hs
-rw-r--r-- 1 carette staff 2.3M 31 Jul 11:40 ./Categories/Adjoint/Equivalents.hs
-rw-r--r-- 1 carette staff 2.0M 1 Aug 09:21 ./Categories/Pseudofunctor/Hom.hs
Categories.Category.Cartesian
itself is down to 1.6M. (It's going to go down further, but that will be a bit of a cheat, since the bulk will be moving to Categories.Category.Cartesian.Monoidal
.)
I am working on a PR for the bicategory of monads in a bicategory (from Street's "formal theory of monads" paper), and I stumbled upon this exact problem (up to the fact that I am not compiling: I am just trying to type-check); I am at the point where I see no way to go on writing, because filling a single hole in a morphism-reasoning and reloading takes forever (~20 minutes, in the fortunate cases; in the unfortunate cases, my machine runs out of RAM -and it has a respectable amount).
Any way out of the problem since @JacquesCarette 's last comment?
@tetrapharmakon probably best to open a separate issue for this. I've encountered different problems in agda-categories whose root cause were quite different.
If your repo is public, then a link to the WIP code (or a gist) for reproducibility would be appreciated. I have had some success in the past tracking down slowness issues (and fixing them).
Thanks. I am currently editing a single file in a branch of the fork me and @iwilare usually use. It's here
In Bicategories.Extras
I added these lines
id₂-comm : α ∘ᵥ id₂ ≈ id₂ ∘ᵥ α
id₂-comm = Bicategory.hom.Equiv.trans Bicat identity₂ʳ (Bicategory.hom.Equiv.sym Bicat identity₂ˡ)
id₂-comm-sym : id₂ ∘ᵥ α ≈ α ∘ᵥ id₂
id₂-comm-sym = Bicategory.hom.Equiv.sym Bicat id₂-comm
that I still have to refactor a bit.
Those look very useful - worth a tiny PR on their own.
@JacquesCarette Can you give me/us some advice on how to chase the problematic piece of code that is slowing down the type-checking so much?
In the past, @yellowsquid managed to get some speed-up by replacing nested records with co-pattern matching. This makes some sense because the unifier need not search for types for the intermediate records.
Maybe that can help?
It helped a little bit at first, but as soon as I try to define a Bicategory
the type-checking time is around 2 minutes each time.
I'm going to write a small guide to efficiency for agda-categories. This week. Not sure if I'll manage to do it today though (there's a fair amount to say).
An incomplete guide to speed in agda-categories
is now on the wiki.
Today I was looking at https://github.com/conal/agda-cat-linear with @conal , and I was working on getting the Nix-based build working. It uses the current master versions of the
standard-library
andagda-categories
.This caused me to do something now, which is to actually compile the code. :-) Nearly all the dependencies compiled fairly quickly, but when it reached
MAlonzo.Code.Categories.Category.Cartesian
, this file alone took more than 30 minutes to compile.It is fairly easy to reproduce this, it requires a directory with all three projects Git cloned, and then running
nix-build --pure
insideagda-cat-linear
. Or, enternix-shell
, cd intosrc
, and typeagda --compile AlgebraicCats.agda
. You'll notice it pause when it reaches that file, and GHC will take 100% of a CPU core for as much of the time as I was willing to watch.