StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
675 stars 145 forks source link

packaging Legion with `tcmalloc` #1556

Open rohany opened 11 months ago

rohany commented 11 months ago

While debugging some legate applications, I found a significant improvement in reduction of jitter between Legion ranks when using tcmalloc for all allocations within the stack, resulting in improvements like 20% at 16 nodes on a CG solve.

We think that there should be an option to package legion (through a cmake flag maybe) to use tcmalloc instead of the default malloc. Doing this as part of the build would require support from the build maintainers as well as work within Legion/Realm to remove places where Legion/Realm try to free an allocation produced by the user, as this may have been allocated with a different malloc.

In the meantime, trying to use tcmalloc with a legion application is very simple: install tcmalloc (sudo apt-get install -y google-perftools), and then run your application with LD_PRELOAD=/path/to/tcmalloc.so.

cc'ing relevant parties who may or may not see an improvement in their applications using tcmalloc @dzhang314 @mariodirenzo @jiazhihao

cc @lightsighter

elliottslaughter commented 11 months ago

I think it's fine to document instructions on how to do this.

If we're going to make this a formal option, I think we should at least try to evaluate some different options. I'm aware of at least:

There are probably others, it's been a while since I looked.

The other thing to think about is workloads: these results are often workload specific. So it would be good to have some other people evaluate the available options and report their results.

rohany commented 11 months ago

I tried jemalloc as well, and saw identical performance as tcmalloc. Unfortunately, my jobs would segfault on shutdown with jemalloc, so some work would have to go into figuring out what the cause of that is.

elliottslaughter commented 11 months ago

Another malloc to maybe try: https://github.com/microsoft/snmalloc