Macaulay2 / M2

The primary source code repository for Macaulay2, a system for computing in commutative algebra, algebraic geometry and related fields.
https://macaulay2.com
347 stars 231 forks source link

thread memory usage #1124

Open DanGrayson opened 4 years ago

DanGrayson commented 4 years ago

Our threads must be made lighter weight, see commit aa30e61cdb178298bdb9f90b7af349bbfb880cfd .

commit aa30e61cdb178298bdb9f90b7af349bbfb880cfd
Author: Daniel R. Grayson <dan@math.uiuc.edu>
Date:   Wed May 6 15:58:00 2020 -0500

    run fewer threads

    ... to save memory.

    The eigen branch somehow increased virtual memory usage per thread from 28MB to
    42MB, so we temporarily lower the number of threads until we figure that out.
mahrud commented 4 years ago

Also, the number of concurrent threads as well as the virtual memory limit should probably scale with the number of CPU cores instead of being hardcoded.

DanGrayson commented 4 years ago

I read somewhere recently that setting the number of threads to twice the number of cores (or virtual cores?) is reasonable.

DanGrayson commented 4 years ago

I no longer think virtual memory is a problem, since it doesn't use any real resource until the memory is touched. But still, it would be good to understand why it's so large. The stack size per thread defaults to 2MB, and is overridden by the resource limit on stack size, which on my Arch system is 9788K (?). That doesn't explain much of the 42MB.

mahrud commented 4 years ago

It depends on where/how each specific project uses parallelisms/concurrency, so I don't think there's a universally reasonable choice. Do we currently use more than one or two threads for any algorithms or anywhere at toplevel? We should experiment and measure. Things like making examples and running tests are easy places to start, and perhaps for ... list ... statements, apply, any, select, etc.

DanGrayson commented 4 years ago

We've actually never managed to get any appreciable speed-up from using threads. Perhaps libgc is the bottleneck.

DanGrayson commented 4 years ago

If our threads are compute-bound, it makes no sense to have more of them than the number of cores and pseudocores, because then we'll be doing context switches needlessly, slowing things down.

rz501 commented 4 years ago

@mahrud One cannot make use of parallelism in a for statement at runtime since there's no guarantee what the body does. Functions on lists (apply, any, select) are a better example but one needs to make sure there isn't any side effects causing a data race either. In particular, it makes more sense to just let the user decide what to dispatch, and may be give them a shortcut for that, the way execution policies work in C++.

@DanGrayson Don't forget one core for the OS! :)