Open ShreyasKhandekar opened 1 month ago
We should have a discussion about whether this initCopy should be made to use the throwing interface so that the problem can be caught instead of just crashing.
Is this the main point of this issue? I see the bug
and compiler
labels but this feels more like a feature request to me. Anyway, if this is the main point, maybe we can give it a different title to focus on that?
The language has a bunch of features that can request a copy of an array. We try to optimize these to some extent, so an in
formal won't necessarily cause a copy. (Of course, it did in the problematic case here). There are quite a few more of these than just in
intent (var x = y;
and returning by value come to mind). I can't imagine removing these (besides, they are stable), and at the end of the day, any language that allows allocating memory will also allow one to write programs that run out of memory. But, we can make tools and warnings and linters etc.
Ah, Github added the Bug label on its own when I filed it. I changed it to feature request. I think I agree with you that, in general, people can always write programs that run out of memory. I think this issue is just some fodder to think about it in terms of 'should we allow programs to recover from such errors instead of just crash?' and 'What are the cases where we would want to support this?'
Summary of Problem
in
intents on functions can cause out of memory issues that crash chapel programs.Description: This issue is prompted by a bug reported by one of the Arkouda devs here: https://github.com/Bears-R-Us/arkouda/issues/3089.
In the case where it was reported, we had the following line in the Arkouda source code:
The
makeDistArray
function is part of thetryCreateArray
interface whichthrows
anArrayOomError
error when the allocation fails because we're out of memory.However, the Arkouda issue above reported that this interface was not working as expected and was causing a server crash even when setting
CHPL_GASNET_SEGMENT=fast
(which is the configuration in which this interface for throwing on OOM works best.)Here the
makeDistArray(a)
has the following function header:Now because of this
in
intent, the array is copied when the function is called, this invokes ourchpl__initCopy
function in theChapelArray.chpl
module. https://github.com/chapel-lang/chapel/blob/e0d0e9db11d04725cb10396bd00f0acf8eac24b9/modules/internal/ChapelArray.chpl#L3013-L3018This function does NOT use the
tryCreateArray
interface and therefore the Arkouda server crashes when the program runs out of memory here.We should have a discussion about whether this initCopy should be made to use the throwing interface so that the problem can be caught instead of just crashing.
Is this a blocking issue with no known work-arounds? The original issue was resolved by using a different version of
makeDistArray
which avoided the array being copied using thein
intent. Thanks a lot to @jeremiah-corrado for suggesting the workaround and helping resolve this issue!Steps to Reproduce
I haven't found time to come up with a small reproducer which does not involve Arkouda. But with Arkouda we can do the following in gasnet config
Launch arkouda server:
(TODO, investigate single locale reproducer)
Run the following commands in the python client:
Now the array size is a tricky argument. In order to get the failure here the array size needs to be:
overMemLimit
function in Arkouda estimates that it will take less memory that it actually does i. This function is how we checked for OOM errors before thetryCreateArray
interface and it's estimates aren't always accurate.chpl__initCopy
step (i.e. before themakeDistArray
function body is actually entered)On my mac this size was
493000000
but on ChapDL it was1393000000
.Configuration Information
Reveal
- Output of `chpl --version`: ``` chpl version 2.1.0 pre-release (7e181b1d05) built with LLVM version 15.0.4 available LLVM targets: nvptx64, nvptx, x86-64, x86, amdgcn, r600 Copyright 2020-2024 Hewlett Packard Enterprise Development LP Copyright 2004-2019 Cray Inc. (See LICENSE file for more details) ``` - Output of `$CHPL_HOME/util/printchplenv --anonymize`: ``` CHPL_TARGET_PLATFORM: linux64 CHPL_TARGET_COMPILER: llvm CHPL_TARGET_ARCH: x86_64 CHPL_TARGET_CPU: none * CHPL_LOCALE_MODEL: flat CHPL_COMM: gasnet * CHPL_COMM_SUBSTRATE: udp CHPL_GASNET_SEGMENT: fast * CHPL_TASKS: qthreads CHPL_LAUNCHER: amudprun CHPL_TIMERS: generic CHPL_UNWIND: none CHPL_MEM: jemalloc CHPL_ATOMICS: cstdlib CHPL_NETWORK_ATOMICS: none CHPL_GMP: bundled CHPL_HWLOC: bundled CHPL_RE2: bundled CHPL_LLVM: system CHPL_AUX_FILESYS: none ``` -- `gcc --version` ``` gcc (SUSE Linux) 7.5.0 Copyright (C) 2017 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. ```