Open boqwxp opened 4 years ago
Thanks. We have experimented with tcmalloc in the past. I didn't know the difference was that large.
Indeed, I was very surprised to see such a large difference. If I still had access to a beefy machine I would love to see what it does for some of my larger problems that have previously taken tens of GB of RAM and 24+h on a Xeon E5-2698v4. If the performance improvement persists with bigger problem sizes and more difficult problems, it's a really huge boost in scalability for model-driven program synthesis techniques in hardware design.
Background
I noticed that, for one of my example
BV
exists-forall problems,time
reported that yices spent a full third of its wall-clock time in the kernel. When I ranstrace -T
I saw that after an initial phase with a normal assortment of syscalls, during the solving phase, system calls were exclusivelybrk()
, presumably called by libcmalloc()
, and in turn by thesafe_malloc()
s I see sprinkled throughout the code base.Experiments
I decided to try an alternative
malloc()
implementation, because a third of all time in the kernel allocating memory is rather excessive.My experiments all ran on a Debian 10 VM, which should use the GNU
malloc()
implementation in libc by default. Here's the time info with the default GNU implementation:With the default allocator, Yices used ~365MB-405MB, with spikes over 400MB.
When I tried linking with jemalloc, which is the one used by FreeBSD libc, I got the following time info:
In this case, memory usage ranged generally between 325-395MB, with spikes up to 450MB.
And when I tried linking with tcmalloc, included in gperftools, I got the following time info:
In this case, Yices used ~365MB of memory with very little change and only slight growth. So it uses less memory at peak and is almost 100% faster.
Next steps
I know different platforms have different
malloc()
implementations, and some are more efficient for this application than others. I also know that this is only one example exercising only one part of the solver that apparently does a lot of context creation and destruction. But I seesafe_malloc()
sprinkled liberally throughout the code base, and I strongly suspect a serious benchmark suite will show significant performance improvement across the board.There are also potential licensing issues.
But I think at least for some platforms (especially those using GNU libc) the default
malloc()
implementation should be changed if at all possible. This is not a measly 3% speedup, it's a very juicy fruit dangling low enough to practically just reach up and pick.