IntelligentSoftwareSystems / Galois

Galois: C++ library for multi-core and multi-node parallelization
http://iss.ices.utexas.edu/?p=projects/galois
Other
314 stars 133 forks source link

Memory allocator #361

Open LAhmos opened 4 years ago

LAhmos commented 4 years ago

is there a clean way to force Galois to use the Glibc memory allocate (malloc/new) instead of the Galois memory allocators?

insertinterestingnamehere commented 4 years ago

Which parts of Galois? Do you have more details about what problem you're having or what you're trying to do? The gstl containers are aliases to the standard library containers set up to use the Galois allocators instead, so you can use the standard containers instead. That said, be warned that malloc/free and new/delete have internal locks, so using them to allocate memory inside a Galois parallel loop will likely kill performance.

LAhmos commented 4 years ago

I am trying to analyze the memory allocation patterns of different apps from Galois. my analyzing tool relies on tracing the calls to the new/malloc functions.
now my question is there a place where I can unalias all gstl containers in a way to make them use malloc/free instead of using Galois allocator.

insertinterestingnamehere commented 4 years ago

Many of our apps don't allocate memory at all inside parallel sections. Depending on which ones you're interested in, you may not need to do anything to our source code.

If you want to mess with the gstl containers, they are all in https://github.com/IntelligentSoftwareSystems/Galois/blob/master/libgalois/include/galois/gstl.h. I don't think we have any switch in our API, but you can just modify the header.

Is there some way to instrument the analysis tool to work properly with the custom allocators? You'll get more reliable data using that approach. The high performance cost from using the built-in allocators there will change how things like load balancing and work stealing happen in a given problem so you may get data that's not actually representative of what happens during normal execution. In other words, measuring things by hooking in the default allocators may skew the data you're trying to collect.

roshandathathri commented 4 years ago

To add what @insertinterestingnamehere said, our allocators are not only concurrent and scalable for parallel memory allocation, but also NUMA-aware (for certain data structures) for scalable performance.

In addition to gstl containers, look at LargeArray (this is for NUMA-aware allocations) and PerThreadStorage (this is for thread-specific allocation) : https://github.com/IntelligentSoftwareSystems/Galois/blob/master/libgalois/include/galois/LargeArray.h https://github.com/IntelligentSoftwareSystems/Galois/blob/master/libgalois/include/galois/substrate/PerThreadStorage.h

Note that modifying any of this affects performance.

LAhmos commented 4 years ago

Thanks a lot for your response.

1-can you point out the apps that allocate memory at all inside parallel sections? (since this the focus of my work) 2- I tried to analyze your custom allocator by tracing the calls to the allocate and deallocate function, would that be enough? I am afraid that the compiler is inlining some of the calls

roshandathathri commented 4 years ago
  1. Check the scientific cpu apps delaunay refinement and triangulation, and check analytics cpu app bipart and gmetis.

  2. Yes, it might be enough but it is possible that the calls could be inlined.

LAhmos commented 4 years ago

Thanks a lot for both of you