Open rohany opened 11 months ago
I think it's fine to document instructions on how to do this.
If we're going to make this a formal option, I think we should at least try to evaluate some different options. I'm aware of at least:
There are probably others, it's been a while since I looked.
The other thing to think about is workloads: these results are often workload specific. So it would be good to have some other people evaluate the available options and report their results.
I tried jemalloc
as well, and saw identical performance as tcmalloc
. Unfortunately, my jobs would segfault on shutdown with jemalloc
, so some work would have to go into figuring out what the cause of that is.
Another malloc to maybe try: https://github.com/microsoft/snmalloc
While debugging some legate applications, I found a significant improvement in reduction of jitter between Legion ranks when using
tcmalloc
for all allocations within the stack, resulting in improvements like 20% at 16 nodes on a CG solve.We think that there should be an option to package legion (through a cmake flag maybe) to use tcmalloc instead of the default malloc. Doing this as part of the build would require support from the build maintainers as well as work within Legion/Realm to remove places where Legion/Realm try to free an allocation produced by the user, as this may have been allocated with a different malloc.
In the meantime, trying to use tcmalloc with a legion application is very simple: install tcmalloc (
sudo apt-get install -y google-perftools
), and then run your application withLD_PRELOAD=/path/to/tcmalloc.so
.cc'ing relevant parties who may or may not see an improvement in their applications using tcmalloc @dzhang314 @mariodirenzo @jiazhihao
cc @lightsighter