AndrasKovacs / normalization-bench

Lambda normalization and conversion checking benchmarks for various implementations
MIT License
53 stars 7 forks source link

Throwing more memory at Coq #2

Open JasonGross opened 4 years ago

JasonGross commented 4 years ago

According to https://github.com/coq/coq/issues/11277, the right way to tweak Coq's GC is to use the environment variable OCAMLRUNPARAM (see https://caml.inria.fr/pub/docs/manual-ocaml/runtime.html). You can also tweak Coq's stack limits via ulimit -S -s, and perhaps get rid of some of the stack overflows that way.

AndrasKovacs commented 4 years ago

Thanks, I was silly to not use ulimit when I already used it for F#. However, setting OCAMLRUNPARAM seems to have no effect at all. Would this only work for native_compute?.

JasonGross commented 4 years ago

Hm, I've never used it myself, but it should work with Coq as a whole, as Coq is an OCaml program. Does OCAMLRUNPARAM="v=0x400" make coqc print gc statistics at the end of running, or does it too do nothing?

cc @gasche (Maybe OCAMLRUNPARAM only works when using ocamlrun and not when using compiled OCaml programs?)

AndrasKovacs commented 4 years ago

It does print. It seems I just used the wrong parameter syntax.

Edit: it probably also seemed to me like nothing is happening, because I can't seem to avoid stack overflow by setting l, even if I set it to my physical memory size.

AndrasKovacs commented 4 years ago

It appears that throwing more memory at Coq barely affects benchmark performance, even though the GC stats clearly indicate that the number of collections is very small when I set minor/major heap sizes.

JasonGross commented 4 years ago

This doesn't surprise me very much. I think the only thing I've seen that's been dominated by GC time has been defining a record with an enormously large number of fields. I expect that conversion doesn't allocate long-lived things that are eventually garbage collected much, and the normalization of the VM is written in C with careful manual memory management. (No idea about why there's no impact on cbv, lazy, and native.). By the way, I expect that some of the native_compute times may be dominated by the large constant overhead of running the OCaml compiler, and I want to note that the read back pass from vm_compute is quadratic in the number of nested binders (so you may get different results if you vm_compute a consumer function which produces something small).