Open michael-schwarz opened 2 weeks ago
Turns out almost all the time is spent constructing and printing pretty_diffs:
(if D.leq v1 v2 then nil else dprintf "diff: %a\n" D.pretty_diff (v1, v2))
++
(if D.leq v2 v1 then nil else dprintf "reverse diff: %a\n" D.pretty_diff (v2, v1))
This takes several orders of magnitude more than just doing the comparisons.
It seems to be somehow not the actual implementation inside apronDomain.ml of pretty_diff
though: That can be commented out to return Pretty.nil
and the slowdown still stays.
The culprit seems to in fact be D.pretty ()
calls...
Which in turn come from Apron, so there must be something super slow happening in Apron.
The flamegraph on the other hand hints at a majority of time (90%) being spent in camlGoblintCil__Pretty__breakString_148
with almost all of that time being spent in camlStdlib__Bytes__sub_341
Perf data in Firefox format: https://gigamove.rwth-aachen.de/en/download/039038f988c7dfbc813e63447a35a0b1(accessible until 07.10).
This does not seem to be a blocker for my thesis, as I will generate missing reports manually in non-verbose mode, but we should look at it.
Apparently the Pretty.text
primitive directly uses that but does something involved under the hood to break the text up further (based on \n
). It's just the massive strings of Apron constraints that it takes so long to split.
I suspect the Apron printer is using Format
with some small output width and causing line breaks where we don't want them anyway. I think I've noticed this in tracing output before as well, but never looked into it.
Maybe something can be done here by using String.split_on_char
?
See e.g. the performance numbers reported here: https://gist.github.com/mooreryan/220b47feea6b253630dab09c4b6ed18c
https://github.com/ocaml/ocaml/blob/107e8d3851f840e00c9c94118d70b74c06995d56/stdlib/string.ml#L225
this seems to avoid all of the intermediate allocating.
let custom_text (s:string) =
let lines = String.split_on_char '\n' s in
let rec doit = function
| [] -> nil
| [x1] -> text x1
| x1::xs -> (text x1) ++ line ++ doit xs
in
doit lines
let pretty () (x:t) = custom_text (show x)
instead of
let pretty () (x:t) = text (show x)
Already goes from me running out of patience and killing it (while not even 1/4 done) after
real 6m20,054s
user 6m18,380s
sys 0m1,252s
to
real 2m25,576s
user 2m13,037s
sys 0m2,895s
That solution obviously is not tail-recursive, but even if we allow some overhead for that it still seems to be the clearly superior approach. My guess would be that the performance of this changed when OCaml switched to immutable strings with 4.04 (?).
This is actual even almost a case of tail_mod_cons, but it's not clear to the compiler because Cons
is hidden behind (++)
.
I let it run to the end
real 20m10,677s
user 19m51,909s
sys 0m4,897s
So the difference is almost an order of magnitude.
With
verbose
mode, we observe terrible slowdowns that seem to be due to very slow pretty-printing.Currently, they are not considered which is bad for long-running benchmarks whereprivPrecCompare
may take a long time or even fail to terminate.