agda / agda-categories

A new Categories library for Agda
https://agda.github.io/agda-categories
MIT License
369 stars 67 forks source link

Compiling Categories.Category.Cartesian is very slow #215

Open jwiegley opened 4 years ago

jwiegley commented 4 years ago

Today I was looking at https://github.com/conal/agda-cat-linear with @conal , and I was working on getting the Nix-based build working. It uses the current master versions of the standard-library and agda-categories.

This caused me to do something now, which is to actually compile the code. :-) Nearly all the dependencies compiled fairly quickly, but when it reached MAlonzo.Code.Categories.Category.Cartesian, this file alone took more than 30 minutes to compile.

It is fairly easy to reproduce this, it requires a directory with all three projects Git cloned, and then running nix-build --pure inside agda-cat-linear. Or, enter nix-shell, cd into src, and type agda --compile AlgebraicCats.agda. You'll notice it pause when it reaches that file, and GHC will take 100% of a CPU core for as much of the time as I was willing to watch.

Taneb commented 4 years ago

I've taken a little look at this, and discovered that Agda is generating a 162093 line long Haskell file when it compiles Categories.Category.Cartesian.

There's a few identifiers that get compiled to over 10000 lines of Haskell each:

and all of

are 19669 lines. These six identifiers account for over three quarters of the compiled Haskell!

Taneb commented 4 years ago

Categories.Category.Cartesian isn't the only problem module. Here's all the modules in current master that compile to more than 100000 lines of Haskell

  1539839 src/MAlonzo/Code/Categories/Theory/Lawvere.hs
   954948 src/MAlonzo/Code/Categories/Category/CartesianClosed.hs
   856929 src/MAlonzo/Code/Categories/Category/Instance/Properties/Posets.hs
   693640 src/MAlonzo/Code/Categories/Category/Instance/Properties/Cats.hs
   523664 src/MAlonzo/Code/Categories/Category/CartesianClosed/Canonical.hs
   398934 src/MAlonzo/Code/Categories/Category/Cocomplete/Finitely.hs
   382421 src/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Pentagon.hs
   374535 src/MAlonzo/Code/Categories/Category/Cocartesian.hs
   338521 src/MAlonzo/Code/Categories/Category/Complete/Finitely.hs
   222713 src/MAlonzo/Code/Categories/Functor/Monoidal/Properties.hs
   176396 src/MAlonzo/Code/Categories/Category/Construction/Properties/Presheaves/Cartesian.hs
   174522 src/MAlonzo/Code/Categories/Category/Monoidal/Instance/Setoids.hs
   172860 src/MAlonzo/Code/Categories/Category/Cartesian/Structure.hs
   165838 src/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Diagonal.hs
   162093 src/MAlonzo/Code/Categories/Category/Cartesian.hs
   147289 src/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/L.hs
   125701 src/MAlonzo/Code/Categories/Bicategory/Bigroupoid.hs
   111602 src/MAlonzo/Code/Categories/Functor/Cartesian/Properties.hs

Notably, Categories.Theory.Lawvere is 68 lines long but compiles to over a million and a half lines of Haskell!

sstucki commented 4 years ago

Just curious: what made you wanna compile the library?

I don't know what causes this code explosion, whether that should be considered a bug and, if so, whether it's an Agda bug or a bug of the library. In any case, I'm not too surprised that some of the code behaves strangely when compiled to Haskell. For one thing, it's not really designed to be run. The particular file you mention, which mechanizes CCCs actually represents CCCs in a very inefficient way. The representation for conceptual reasons: it closely models a very general definition of exponentials that can be found in the literature. The module also contains big proof terms of e.g. equations, which are probably even less 'optimal'. (Some of the definitions that you noted above as producing very big Haskell terms are such proof terms.) I'm not saying that the inefficient representation is the source of the code explosion, it's likely something else, only that this code isn't really supposed to be compiled (IMO).

If one wanted to use CCCs to structure actual code, one would use e.g. the alternative but equivalent 'canonical' representation of CCCs (from the CartesianClosed/Canonical module), and without the equivalence proofs.

All that being said, I'm also curious what exactly causes the explosion. Maybe it's the syntactic sugar in the pentagon diagrams that's somehow expanded badly or there's something that triggers normalization of terms involving records. Hard to know. Probably some work is needed to find a minimal Agda example that reproduces the issue.

JacquesCarette commented 4 years ago

I would not be surprised if there were another issue also compounding this: loss of sharing. We use Agda's modules a lot, and they are notorious for causing loss of sharing. So there might be huge amounts of duplication - and this gets worst when you are at the "top of the heap", i.e. CCC. (What's going on with Lawvere Theories is quite mysterious though).

I wonder how much of this is related to proof relevance? Beyond the sizes of these terms, knowing just a bit more about what's really in them would be interesting. Also, this issue is more likely an Agda issue (though if we knew what triggered it and it is possible to avoid library-side, I'd like to know). So it would make sense to file something there too.

sstucki commented 4 years ago

I've taken a little look at this, and discovered that Agda is generating a 162093 line long Haskell file when it compiles Categories.Category.Cartesian.

There's a few identifiers that get compiled to over 10000 lines of Haskell each: (...)

  • Categories.Category.Cartesian.Cartesian.monoidal.pentagon (...) are 19669 lines. These six identifiers account for over three quarters of the compiled Haskell!

Interestingly, these all seem to be instances of the same pentagon proof, the one at Cartesian.agda, L346

The only other non-trivial instances of such proofs I could find were those in Categories.Bicategory.Instance.EnrichedCats and Categories.Bicategory.Instance.Cats. Do these by any chance result in huge Haskell files as well?

Taneb commented 4 years ago

Categories.Bicategory.Instance.EnrichedCats.EnrichedCats compiles to something over 10000 lines long. Categoires.Bicategory.Instance.Cats.Cats compiles to over 40000 lines.

jwiegley commented 4 years ago

I can appreciate the answer being "don't do that". :) I more wondered because one objective of my Coq version of category theory is that it be reasonably computable, for use as an intermediate language in "transpiling" from lambda calculus to other categories by way of BiCCCs (per Conal's work). From what I've seen, the agda-categories library is quite attractive, so I was evaluating to see whether my work should shift here. Not having an efficient computable representation concerns me a little bit, as I then wonder if it will make it difficult to base a compilation toolchain on categories using this approach.

sstucki commented 4 years ago

Thanks for elaborating on that @jwiegley!

conal commented 4 years ago

Like John, I have also been intending to use agda-categories as a practical platform for computation, replacing and improving on the Haskell version of compiling-to-categories. I didn't realize that the library is "not really designed to be run".

sstucki commented 4 years ago

Something is definitely up with the pentagon equations.

Sorting by file size, here are the top 10 generated Haskell files that contain the string "pentagon":

115504  build/MAlonzo/Code/Categories/Theory/Lawvere.hs
81992   build/MAlonzo/Code/Categories/Category/CartesianClosed.hs
81984   build/MAlonzo/Code/Categories/Category/Instance/Properties/Posets.hs
55688   build/MAlonzo/Code/Categories/Category/Instance/Properties/Cats.hs
38992   build/MAlonzo/Code/Categories/Category/CartesianClosed/Canonical.hs
33376   build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Pentagon.hs
29376   build/MAlonzo/Code/Categories/Category/Cocomplete/Finitely.hs
27216   build/MAlonzo/Code/Categories/Category/Cocartesian.hs
24888   build/MAlonzo/Code/Categories/Category/Complete/Finitely.hs
13928   build/MAlonzo/Code/Categories/Category/Construction/Properties/Presheaves/Cartesian.hs

And here are the top 10 generated Haskell files not containing that string:

16168   build/MAlonzo/Code/Categories/Functor/Monoidal/Properties.hs
12488   build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Diagonal.hs
11088   build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/L.hs
9320    build/MAlonzo/Code/Categories/Functor/Cartesian/Properties.hs
6696    build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Dinatural.hs
5184    build/MAlonzo/Code/Categories/Bicategory/Instance/Cats.hs
5144    build/MAlonzo/Code/Categories/Category/Construction/Grothendieck.hs
4680    build/MAlonzo/Code/Categories/Category/Monoidal/Properties.hs
4232    build/MAlonzo/Code/Categories/CoYoneda.hs
4064    build/MAlonzo/Code/Categories/Yoneda/Continuous.hs

All sizes are in kilobytes.

I'm not sure what causes this yet though.

HuStmpHrrr commented 4 years ago

I will point my finger at proof relevance. another reason could be code extraction in Agda is not good at dealing with some particular Agda feature which we use extensively.

Is it possible to see what's in the generated haskell code to know what is taking so many lines?

HuStmpHrrr commented 4 years ago

technically, every line of code in this library can be run, due to proof relevance. it is a question, however, whether running it is efficient. many constructions here are relatively straightforward, but it is very difficult to evaluate that for proofs.

Taneb commented 4 years ago

Is it possible to see what's in the generated haskell code to know what is taking so many lines?

@HuStmpHrrr agda -c --no-main --ghc-dont-call-ghc src/Categories/Theory/Lawvere.agda will put the generated Haskell in src/MAlonzo/Code/Categories/Theory/Lawvere.hs, which you can inspect in your text editor of choice, although I certainly wouldn't call it "readable".

JacquesCarette commented 4 years ago

This was also discussed in the agda-categories Slack. [Though I think we should, like for Agda itself, move to Zulip.]

jwiegley commented 4 years ago

As another data point, Conal has a definition of biproduct categories here, which is 71 lines long:

https://github.com/conal/agda-cat-linear/blob/master/src/Biproduct.agda

It generates a Haskell file that is 1,480,724 lines long, and generates a total of 2,143,057 lines of Haskell in the MAlonzo source directory. It takes 3.7 hours on my iMac Pro for this Haskell code to compile.

JacquesCarette commented 4 years ago

Wow. I'm pretty sure that Cartesian is partially at fault here.

There are some pretty significant blowups in the sizes of .agdai files too (which I worked on reducing some months ago, it was much worse). It might be related.

HuStmpHrrr commented 4 years ago

It might has something to do with patterns of nested modules, things like

  field
    T : CartesianCategory o ℓ e

  module T = CartesianCategory T
sstucki commented 4 years ago

I played around a bit more with the declaration of the pentagon field of the Monoidal record in Categories.Category.Monoidal.Core. None of it made any difference. In particular, I tried

but none of this changed the size of the file build/MAlonzo/Code/Categories/Category/CartesianClosed.hs one byte.

This suggests that the issue isn't with the declaration (i.e. the type signature) but with the proof of the pentagon identity in Categories.Category.Cartesian https://github.com/agda/agda-categories/blob/7672b7a3185ae77467cc30e05dbe50b36ff2af8a/src/Categories/Category/Cartesian.agda#L346-L372

It might has something to do with patterns of nested modules, things like

  field
    T : CartesianCategory o ℓ e

  module T = CartesianCategory T

I also suspect that's part of the problem. At the very least it explains why that one big pentagon proof leads to code explosions everywhere. I suspect it's because it is re-exported over and over again, and the re-exports basically create copies of the term. For example, there's a local module definition in the Cartesian module right after the proof: https://github.com/agda/agda-categories/blob/7672b7a3185ae77467cc30e05dbe50b36ff2af8a/src/Categories/Category/Cartesian.agda#L375

which is then re-exported e.g. in Categories.Category.CartesianClosed and in Categories.Category.Monoidal.Instance.Cats, and from there it spreads to Bicategories (via enrichment), etc. This might also explain why some of the proofs there take such a long time to complete.

Case in point: The following lines from Categories.Category.CartesianClosed.Canonical re-export the contents of Cartesian, which indirectly re-exports the nested monoidal module contained in Cartesian and with it the pentagon proof. https://github.com/agda/agda-categories/blob/7672b7a3185ae77467cc30e05dbe50b36ff2af8a/src/Categories/Category/CartesianClosed/Canonical.agda#L82-L84

This results in a copy of the pentagon proof in the generated Canonical.hs. Only when I explicitly hide the proof and all the local modules in Cartesian that re-export the proof does the corresponding term disappear from the Haskell file.

  open Cartesian isCartesian public
    hiding (_×_; π₁; π₂; ⟨_,_⟩; monoidal; braided; symmetric; pentagon)
    renaming (⟨⟩-cong₂ to ⟨,⟩-resp-≈)

EDIT: I checked again and the following turned out to be wrong. The private module application did not cause the pentagon proof to be included in the Haskell file. It seems I missed to hide one of the other public (but indirect) re-exports. This is maybe not too surprising because this open statement re-exports the contents of Cartesian, but apparently it's sufficient to simply open an applied instance of CartesianClosed for the same to happen. Consider the following line from the same file, which result in further copies of the pentagon proof in the generated Canonical.hs: https://github.com/agda/agda-categories/blob/7672b7a3185ae77467cc30e05dbe50b36ff2af8a/src/Categories/Category/CartesianClosed/Canonical.agda#L159 (Removed rest of this comment about the private module application of CartesianClosed)

I'd love to know how module applications and re-exports are actually implemented. Does anybody know if this is documented somewhere? If not, maybe we should ask one of the devs on Zulip or so.

In the meantime, a fix would be to move the nested module definitions of monoidal and symmetric out of the Cartesian record, making them into separate top-level definitions that can be imported only when necessary.

Of course, none of this explains why the pentagon proof gets so enormous when compiled to Haskell. I'll investigate that further once I find some more time.

sstucki commented 4 years ago

EDIT: Private module applications do not seem the culprit (see edited comment above). The bit about module applications really worries me. If these are costly irrespective of whether they are opened publicly or privately, then the library is basically doomed to suffer from code explosion.

sstucki commented 4 years ago

After double-checking, the private module applications turned out not to cause code bloat. Only the public ones (which makes more sense). Sorry for the confusion.

Still, it is somewhat strange to me that public re-exports actually result in a copy of the proof term (which seems to cause the code bloat) and not just in a call to the definition in the original module.

Is this a bug in Agda? Should we file a bug report? If anyone has insights about this, I'd be curious to know the rationale.

sstucki commented 4 years ago

I have refactored the Cartesian module/record so that the monoidal field is now part of a separate module, and all monoidal properties are re-exported in that module instead (see PR #218).

This seems to help. There are fewer modules that contain the term pentagon now, and those that still do are smaller. The updated tallies (in kilobytes) are as follows.

Top 10 largest Haskell files containing "pentagon":

51264   build/MAlonzo/Code/Categories/Category/CartesianClosed.hs
32832   build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Pentagon.hs
12352   build/MAlonzo/Code/Categories/Category/Cartesian.hs
10304   build/MAlonzo/Code/Categories/Bicategory/Bigroupoid.hs
4160    build/MAlonzo/Code/Categories/Category/Monoidal/Instance/Setoids.hs
2112    build/MAlonzo/Code/Categories/Enriched/NaturalTransformation/NaturalIsomorphism.hs
384 build/MAlonzo/Code/Categories/Category/Monoidal/Closed.hs
256 build/MAlonzo/Code/Categories/Category/RigCategory.hs
192 build/MAlonzo/Code/Categories/Category/Monoidal/Traced.hs
128 build/MAlonzo/Code/Categories/Category/Monoidal/Utilities.hs

Top 10 largest Haskell files not containing "pentagon":

16448   build/MAlonzo/Code/Categories/Functor/Monoidal/Properties.hs
12352   build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Diagonal.hs
11328   build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/L.hs
9280    build/MAlonzo/Code/Categories/Functor/Cartesian/Properties.hs
6208    build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Dinatural.hs
5184    build/MAlonzo/Code/Categories/Category/Construction/Grothendieck.hs
4160    build/MAlonzo/Code/Categories/Functor/Slice.hs
3136    build/MAlonzo/Code/Categories/Category/Monoidal/Closed/IsClosed/Identity.hs
2112    build/MAlonzo/Code/Categories/Category/Equivalence/Properties.hs
1088    build/MAlonzo/Code/Categories/Category/Instance/Properties/Cats.hs

Clearly, some of these are still ridiculously big (there shouldn't be any ~10 MB sized source files) but at least the situation has improved.

jwiegley commented 4 years ago

Hi @sstucki, were you able to find out if this is an Agda bug?

sstucki commented 4 years ago

Unfortunately not.

I wrote up a few small snippets with nested modules, module applications, etc. but I could never reproduce the odd behavior seen here, namely that a term is basically copied instead of just referenced when it is re-exported.

Without a minimal example, it's really hard to tell whether it's a bug, even for the Agda devs (I suspect). But I might be wrong, so feel free to file an issue in the Agda repo yourself.

JacquesCarette commented 4 years ago

As far as I know (from comments that both @UlfNorell and @WolframKahl made), 1) module parameters are instantiated by copying, and 2) private modules are also done by copying, with no real attempt to maintain sharing. That can definitely cause blowups. Whether that's what's going on here, I can't tell.

HuStmpHrrr commented 3 years ago

https://github.com/agda/agda2hs is probably a rescue.

UlfNorell commented 3 years ago

agda2hs is something completely different. The goal of that project is to let you write "Haskell" code in Agda and prove things about it, not compiling arbitrary Agda code to Haskell.

JacquesCarette commented 3 years ago

Weird progress: by taking the pentagon proof out of being defined inside the record, but instead to a private variable outside of it, this cuts the resulting Haskell file from 12M to 5M. Still huge, but a non-trivial gain.

JacquesCarette commented 3 years ago

Even more progress: by changing the Cartesian module to not create a module with all the BinaryProducts stuff and then exporting it publicly as well, this goes from 5M to a little less than 2M. I'll push that soon.

JacquesCarette commented 3 years ago

The "winners" in the insanely large compiled files (as documented in the 2nd issue on the agda issues) are now:

MacBook-Air-2:Code carette$ find . -xdev -type f -size +10M -print | xargs ls -lh | sort -k5,5 -h -r
-rw-r--r--  1 carette  staff   222M  1 Aug 09:28 ./Categories/Bicategory/Construction/Spans.hs
-rw-r--r--  1 carette  staff    60M  1 Aug 09:22 ./Categories/Category/Monoidal/Interchange.hs
-rw-r--r--  1 carette  staff    47M  1 Aug 09:18 ./Categories/Category/CartesianClosed.hs
-rw-r--r--  1 carette  staff    30M  1 Aug 09:20 ./Categories/Category/Monoidal/Closed/IsClosed/Pentagon.hs
-rw-r--r--  1 carette  staff    15M  1 Aug 09:21 ./Categories/Functor/Monoidal/Properties.hs
-rw-r--r--  1 carette  staff    13M  1 Aug 09:24 ./Categories/Adjoint/Monadic/Properties.hs
-rw-r--r--  1 carette  staff    12M  1 Aug 09:20 ./Categories/Category/Monoidal/Closed/IsClosed/Diagonal.hs

So CartesianClosed has actually gone down a tiny bit (it used to be 51M).

Interestingly, if one continues below 10M, it doesn't explode, i.e. the number of cases is not bad:

-rw-r--r--  1 carette  staff   9.9M  1 Aug 09:19 ./Categories/Category/Monoidal/Closed/IsClosed/L.hs
-rw-r--r--  1 carette  staff   9.3M  1 Aug 09:29 ./Categories/Pseudofunctor/Composition.hs
-rw-r--r--  1 carette  staff   8.4M  1 Aug 09:23 ./Categories/Functor/Cartesian/Properties.hs
-rw-r--r--  1 carette  staff   6.5M  1 Aug 09:30 ./Categories/Bicategory/Bigroupoid.hs
-rw-r--r--  1 carette  staff   5.5M  1 Aug 09:23 ./Categories/Category/Monoidal/Closed/IsClosed/Dinatural.hs
-rw-r--r--  1 carette  staff   4.2M  1 Aug 09:29 ./Categories/Category/Construction/Grothendieck.hs
-rw-r--r--  1 carette  staff   4.0M  1 Aug 09:22 ./Categories/Functor/Slice.hs
-rw-r--r--  1 carette  staff   3.5M  1 Aug 09:21 ./Categories/Bicategory/Instance/Cats.hs
-rw-r--r--  1 carette  staff   3.3M  1 Aug 09:19 ./Categories/Category/Monoidal/Closed/IsClosed/Identity.hs
-rw-r--r--  1 carette  staff   3.2M  1 Aug 09:23 ./Categories/Category/Construction/Adjoints.hs
-rw-r--r--  1 carette  staff   3.1M  1 Aug 09:19 ./Categories/Yoneda/Continuous.hs
-rw-r--r--  1 carette  staff   3.0M  1 Aug 09:18 ./Categories/Category/Monoidal/Properties.hs
-rw-r--r--  1 carette  staff   2.7M  1 Aug 09:29 ./Categories/CoYoneda.hs
-rw-r--r--  1 carette  staff   2.7M  1 Aug 09:28 ./Categories/Enriched/NaturalTransformation.hs
-rw-r--r--  1 carette  staff   2.5M  1 Aug 09:21 ./Categories/NaturalTransformation/NaturalIsomorphism/Monoidal.hs
-rw-r--r--  1 carette  staff   2.5M  1 Aug 09:17 ./Categories/Category/Cocartesian.hs
-rw-r--r--  1 carette  staff   2.3M 31 Jul 11:40 ./Categories/Adjoint/Equivalents.hs
-rw-r--r--  1 carette  staff   2.0M  1 Aug 09:21 ./Categories/Pseudofunctor/Hom.hs

Categories.Category.Cartesian itself is down to 1.6M. (It's going to go down further, but that will be a bit of a cheat, since the bulk will be moving to Categories.Category.Cartesian.Monoidal.)

tetrapharmakon commented 1 year ago

I am working on a PR for the bicategory of monads in a bicategory (from Street's "formal theory of monads" paper), and I stumbled upon this exact problem (up to the fact that I am not compiling: I am just trying to type-check); I am at the point where I see no way to go on writing, because filling a single hole in a morphism-reasoning and reloading takes forever (~20 minutes, in the fortunate cases; in the unfortunate cases, my machine runs out of RAM -and it has a respectable amount).

Any way out of the problem since @JacquesCarette 's last comment?

JacquesCarette commented 1 year ago

@tetrapharmakon probably best to open a separate issue for this. I've encountered different problems in agda-categories whose root cause were quite different.

If your repo is public, then a link to the WIP code (or a gist) for reproducibility would be appreciated. I have had some success in the past tracking down slowness issues (and fixing them).

tetrapharmakon commented 1 year ago

Thanks. I am currently editing a single file in a branch of the fork me and @iwilare usually use. It's here

In Bicategories.Extras I added these lines

id₂-comm : α ∘ᵥ id₂ ≈ id₂ ∘ᵥ α
id₂-comm = Bicategory.hom.Equiv.trans Bicat identity₂ʳ (Bicategory.hom.Equiv.sym Bicat identity₂ˡ)

id₂-comm-sym : id₂ ∘ᵥ α ≈ α ∘ᵥ id₂
id₂-comm-sym = Bicategory.hom.Equiv.sym Bicat id₂-comm

that I still have to refactor a bit.

JacquesCarette commented 1 year ago

Those look very useful - worth a tiny PR on their own.

tetrapharmakon commented 1 year ago

@JacquesCarette Can you give me/us some advice on how to chase the problematic piece of code that is slowing down the type-checking so much?

ohad commented 1 year ago

In the past, @yellowsquid managed to get some speed-up by replacing nested records with co-pattern matching. This makes some sense because the unifier need not search for types for the intermediate records.

Maybe that can help?

tetrapharmakon commented 1 year ago

It helped a little bit at first, but as soon as I try to define a Bicategory the type-checking time is around 2 minutes each time.

JacquesCarette commented 1 year ago

I'm going to write a small guide to efficiency for agda-categories. This week. Not sure if I'll manage to do it today though (there's a fair amount to say).

JacquesCarette commented 1 year ago

An incomplete guide to speed in agda-categories is now on the wiki.