Open Ethan-DeBandi99 opened 2 years ago
Elliot has probably mentioned this, and it's not terribly satisfying, but one way other groups have worked around memory limits is to use the C back-end and to do the Chapel compilation and C compilation steps distinctly to reduce the amount of memory required. The reason I say this isn't particularly satisfying is that it uses the C, rather than LLVM, back-end, which isn't the default/preferred choice, s.t. the CI wouldn't be checking the typical user experience. And beyond that, it's obviously unfortunate to have to do these sorts of workarounds. But if the other near-term possibility is to disable gasnet CI altogether, it may still be preferable.
Longer-term, the dyno compiler refactoring project should help with these memory requirements, but that's a ways off. I know that Elliot has posed a question to the team to ask whether there are things we could do in the shorter-term, and I think that's a line of inquiry we should continue to pursue.
Arkouda used to do separate chapel + backend invocations in https://github.com/Bears-R-Us/arkouda/pull/208, but we were able to get rid of that when switching to github actions in https://github.com/Bears-R-Us/arkouda/pull/238. Over time as arkouda has grown in size we're now exceeding the memory+swap capability of github actions.
Using the C backend may be a workaround, but note that so far as I know we don't build the docker image with the C backend, so that's not a trivial change to make.
Arkouda has a "quick compile" mode (ARKOUDA_QUICK_COMPILE=true
) that already throws all the flags I know of to reduce code size (and thus AST/memory size). This includes --no-checks
and --no-fast-followers
. A quick test with --no-auto-local-access
didn't show any changes and I don't know of any other flags that would matter.
Just to have a ballpark for memory figures, I collected some peak memory usage results w/ and w/o this quick compile mode and with the full compile vs just the frontend for a gasnet build similar to the CI config. So a matrix of ARKOUDA_QUICK_COMPILE
set and unset and "--stop-after-pass denormalize
. I'm gathering these results with /usr/bin/time -v
and plucking out "Maximum resident set size (kbytes):". I'm not sure if there's other means of collecting or how reliable this is, but it's what I found in https://stackoverflow.com/questions/774556/peak-memory-usage-of-a-linux-unix-process.
config | Frontend | Full Build |
---|---|---|
quick compile | 8_545_820 | 9_458_536 |
optimized build | 10_052_840 | 11_970_391 |
So ~9.5G for a full build with quick compile. Github actions should have 7G in memory and 4G in swap, but some of that is required by the OS so I'm not sure how much is actual available to users (and I'm not sure how comprehensive these values from time -v
are.
Related: #20871
Submitting this issue after a conversation with @ronawho and @bmcdonald3.
Arkouda has been experiencing intermittent CI issues during the
gasnet
testing phase of the github CI. The error returns asHere is a link to an example action producing this, action link.
After speaking with @ronawho, it seems that Arkouda is very close to the memory limits of GitHub CI resources and can cause this in some cases. As a work around, we can remove gasnet testing from our CI, but this significantly reduces our CI coverage and would not be ideal.
Tagging @pierce314159 and @hokiegeek2 for their awareness on this issue.