chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.8k stars 421 forks source link

Reduce Compiler Memory Pressure #20845

Open Ethan-DeBandi99 opened 2 years ago

Ethan-DeBandi99 commented 2 years ago

Submitting this issue after a conversation with @ronawho and @bmcdonald3.

Arkouda has been experiencing intermittent CI issues during the gasnet testing phase of the github CI. The error returns as

Killed
make: *** [Makefile:276: arkouda_server] Error 137
Error: Process completed with exit code 2.

Here is a link to an example action producing this, action link.

After speaking with @ronawho, it seems that Arkouda is very close to the memory limits of GitHub CI resources and can cause this in some cases. As a work around, we can remove gasnet testing from our CI, but this significantly reduces our CI coverage and would not be ideal.

Tagging @pierce314159 and @hokiegeek2 for their awareness on this issue.

bradcray commented 2 years ago

Elliot has probably mentioned this, and it's not terribly satisfying, but one way other groups have worked around memory limits is to use the C back-end and to do the Chapel compilation and C compilation steps distinctly to reduce the amount of memory required. The reason I say this isn't particularly satisfying is that it uses the C, rather than LLVM, back-end, which isn't the default/preferred choice, s.t. the CI wouldn't be checking the typical user experience. And beyond that, it's obviously unfortunate to have to do these sorts of workarounds. But if the other near-term possibility is to disable gasnet CI altogether, it may still be preferable.

Longer-term, the dyno compiler refactoring project should help with these memory requirements, but that's a ways off. I know that Elliot has posed a question to the team to ask whether there are things we could do in the shorter-term, and I think that's a line of inquiry we should continue to pursue.

ronawho commented 2 years ago

Arkouda used to do separate chapel + backend invocations in https://github.com/Bears-R-Us/arkouda/pull/208, but we were able to get rid of that when switching to github actions in https://github.com/Bears-R-Us/arkouda/pull/238. Over time as arkouda has grown in size we're now exceeding the memory+swap capability of github actions.

Using the C backend may be a workaround, but note that so far as I know we don't build the docker image with the C backend, so that's not a trivial change to make.

Arkouda has a "quick compile" mode (ARKOUDA_QUICK_COMPILE=true) that already throws all the flags I know of to reduce code size (and thus AST/memory size). This includes --no-checks and --no-fast-followers. A quick test with --no-auto-local-access didn't show any changes and I don't know of any other flags that would matter.


Just to have a ballpark for memory figures, I collected some peak memory usage results w/ and w/o this quick compile mode and with the full compile vs just the frontend for a gasnet build similar to the CI config. So a matrix of ARKOUDA_QUICK_COMPILE set and unset and "--stop-after-pass denormalize. I'm gathering these results with /usr/bin/time -v and plucking out "Maximum resident set size (kbytes):". I'm not sure if there's other means of collecting or how reliable this is, but it's what I found in https://stackoverflow.com/questions/774556/peak-memory-usage-of-a-linux-unix-process.

config Frontend Full Build
quick compile 8_545_820 9_458_536
optimized build 10_052_840 11_970_391

So ~9.5G for a full build with quick compile. Github actions should have 7G in memory and 4G in swap, but some of that is required by the OS so I'm not sure how much is actual available to users (and I'm not sure how comprehensive these values from time -v are.

bradcray commented 2 years ago

Related: #20871