Open cassella opened 1 year ago
I think you have found a simplification of a problem I have seen. Consider this partial reduction
proc partial
{
var A : [1..8, 1..8] real;
var x : [1..8] real;
for i in 1..8 do // fill A with some data
{
for j in 1..8 do
{
A[i, j] = (4 - i):real + ((5 - j) * 2):real;
}
}
x = + scan (A[1, ..]); // likewise with 'x'
writeln(x);
// 3 IMPORTANT LINES
const u = + reduce [i in 1..8] x[i] * A[i, ..]; // partial reduction
var y : [1..8] real = 0.0; // build reduction using reduction intent
var z = y; // build reduction into 'z' manually
for r in 1..8 do
z += x[r] * A[r, ..];
[ (r, c) in A.domain with (+ reduce y) ] y[c] += x[r] * A[r, c];
writeln('partial reduction:');
writeln('- manual: ', z);
writeln('- intent: ', y);
writeln('- direct: ', u);
}
This crashes with the error
m.chpl:16: error: halt reached - argument to ! is nil
Moving the partial reduction two lines down within the THREE IMPORTANT LINES and the identical reduction code now works!
var y : [1..8] real = 0.0; // build reduction using reduction intent
var z = y; // build reduction into 'z' manually
const u = + reduce [i in 1..8] x[i] * A[i, ..]; // partial reduction
Or removing from the reduction the complexity of the scalar multiplication of the row of the matrix also works!!
inline proc row(m, v) return m * v;
const u = + reduce [r in 1..8] row(x[r], A[r, ..]); // partial reduction
var y : [1..8] real = 0.0; // build reduction using reduction intent
var z = y; // build reduction into 'z' manually
You have a slightly different test case to help you track down (and correct) what looks like much the same bug.
I suggest you try the code with the same (later) version of the compiler that you are using. I have vanilla 1.29.0 with clang14.
It's not directly related to reductions: I also get a dereference nil error from this loop body:
writeln([xy in {-1..1,-1..1}] xy);
My example is also sensitive to the continue
-- without that I don't get errors for at least that last form or for the uncommented-out loop body in the initial comment.
With this body
writeln(forall xy in {-1..1,-1..1} do xy); // nil
$CHPL_HOME/modules/internal/ChapelDomain.chpl:1069: error: attempt to dereference nil
which is the inst.remove()
in
proc _do_destroy () {
if ! _unowned {
on _instance {
// Count the number of arrays that refer to this domain,
// and mark the domain to be freed when that number reaches 0.
// Additionally, if the number is 0, remove the domain from
// the distribution and possibly get the distribution to free.
const inst = _instance;
var (domToFree, distToRemove) = inst.remove();
var distToFree:unmanaged BaseDist? = nil;
if distToRemove != nil {
distToFree = distToRemove!.remove();
}
if domToFree != nil then
_delete_dom(inst, _isPrivatized(inst));
if distToFree != nil then
_delete_dist(distToFree!, _isPrivatized(inst.dist));
}
(That's the same line as some of the earlier cases hit, but which I didn't look into before.)
Initializing A = true
so that the continue is never taken makes the problem go away.
Making each of the loop body options depend on a config param
also makes the problem go away.
Last year it was day 23, but this year I hit it on aoc day 22. This is more than a little different, but similar enough I'll just note it here for now.
This similarly involves a segfault after a taken continue from a forall loop.
use Set;
var sortedBlocks = [ i in 1..7 ] i;
config var skip = false;
forall b in 1..10 {
if skip then continue;
var todo: set(int);
}
Run without --skip
, no segfault.
If todo
is just an int, no segfault.
If the todo
declaration is ahead of the continue, no segfault.
Would you believe without the sortedBlocks
line the segfault goes
away?
Thread 2 "segfault" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff61ff640 (LWP 3878427)]
chpl_je_arena_mapbits_get (pageind=509, chunk=<optimized out>) at /home/fortytwo/src/chapel/third-party/jemalloc/jemalloc-src/include/jemalloc/internal/arena.h:809
809 return (arena_mapbitsp_read(arena_mapbitsp_get_const(chunk, pageind)));
(gdb) i s
#0 chpl_je_arena_mapbits_get (pageind=509, chunk=<optimized out>) at /home/fortytwo/src/chapel/third-party/jemalloc/jemalloc-src/include/jemalloc/internal/arena.h:809
#1 chpl_je_arena_dalloc (ptr=0x3ebdfdee8, tcache=0x7ffff6252000, slow_path=false, tsdn=<optimized out>)
at /home/fortytwo/src/chapel/third-party/jemalloc/jemalloc-src/include/jemalloc/internal/arena.h:1434
#2 chpl_je_idalloctm (ptr=0x3ebdfdee8, tcache=0x7ffff6252000, slow_path=false, tsdn=<optimized out>, is_metadata=<optimized out>) at include/jemalloc/internal/jemalloc_internal.h:1170
#3 chpl_je_iqalloc (ptr=0x3ebdfdee8, tcache=0x7ffff6252000, slow_path=false, tsd=<optimized out>) at include/jemalloc/internal/jemalloc_internal.h:1187
#4 ifree (tsd=<optimized out>, ptr=0x3ebdfdee8, tcache=0x7ffff6252000, slow_path=false) at /home/fortytwo/src/chapel/third-party/jemalloc/jemalloc-src/src/jemalloc.c:1896
#5 0x0000555555600c72 in chpl_free ()
#6 0x00005555555ff5fd in chpl_mem_array_free ()
#7 0x00005555555870c0 in _freeData_chpl ()
#8 0x0000555555588279 in deinit_chpl16 ()
#9 0x00005555555fe374 in coforall_fn_chpl15 ()
#10 0x00005555555fe4b9 in wrapcoforall_fn_chpl15 ()
#11 0x000055555560e97a in chapel_wrapper (arg=0x7ffff5762040) at tasks-qthreads.c:819
#12 0x0000555555696a7e in qthread_wrapper (ptr=0x7ffff5762000) at /home/fortytwo/src/chapel/third-party/qthread/qthread-src/src/qthread.c:2194
#13 0x0000000000000000 in ?? ()
chpl version 1.34.0 pre-release (73c352c89e)
built with LLVM version 14.0.0
gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0
Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu
CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native *
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none *
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: none
CHPL_LLVM: system
CHPL_AUX_FILESYS: none
Moving the partial reduction two lines down within the THREE IMPORTANT LINES and the identical reduction code now works!
FWIW, I get the same error as you reported with the partial reduction line in either location.
As a snapshot, the following simpler program hits a nil dereference error:
proc main {
forall Exy in these2() {
if Exy == 0 then continue; // nil dereference is reported for this line
var D = {-1..1};
var sum = 0;
[y in ([x in these4()] x) with (+ reduce sum)] {
sum += y;
}
}
}
iter these2() do yield 5;
iter these2(param tag: iterKind)
{
// this gives ASAN error:
//coforall chunk in these3()
// this gives a nil dereference:
for chunk in these3()
{
yield 0;
yield 1;
}
}
iter these3() do yield 0;
iter these4() do yield 0;
On that I get the nil dereference on that same inst.remove()
line as I saw last January.
FYI, in case you skimmed past it, the reproducer from this last December may be a simpler starting point.
I don't know why I didn't try this before. With CHPL_LLVM=none chpl --savec savec foo.chpl
, the reproducer from this past December has this really suspect bit
static void coforall_fn_chpl13(int64_t len_chpl,
int64_t numChunks_chpl,
range_int64_t_both_one_chpl this_chpl7,
chpl___EndCount_AtomicT_int64_t_int64_t _coforallCount_chpl,
int64_t chunk_chpl,
chpl_bool skip_chpl2) {
...
for (i_chpl = tmp_x0_chpl; ((i_chpl <= _ic__F1_high_chpl)); i_chpl += INT64(1)) {
if (skip_chpl2) {
goto _continueLabel_chpl;
}
init_chpl100(&todo_chpl, local_defaultHashTableResizeThreshold_chpl, INT64(16));
_continueLabel_chpl:;
i_x_chpl = &todo_chpl;
_field_destructor_tmp__chpl = &((i_x_chpl)->_htb);
deinit_chpl16(_field_destructor_tmp__chpl);
}
return;
}
where the continue
's goto
skips todo
's initialization but not its deinit, if I'm following
This is probably a better test case, since it doesn't rely on the memory error happening to cause trouble down the line,
var count: atomic int;
record R {
proc init() { count.add(1); }
proc deinit() { count.add(-1); }
}
config var skip = true;
forall b in 1..10 {
if skip then continue;
var todo: R;
}
writeln(count.read());
Currently:
$ ./foo -nl 1
-10
$ ./foo -nl 1 --skip=false
0
Thanks Paul!
I think this would be a more complete test, showing the right inits+deinits are skipped:
var ic, dc: atomic int;
var iv, dv: atomic int;
record R {
var myval: int;
proc init(x) { myval = x; iv.add(myval); ic.add(1); }
proc deinit() { dv.add(myval); dc.add(1); }
}
config var skip = true;
forall i in 0..8 {
var r0 = new R(1);
if i == 0 then continue;
var r1 = new R(10);
if i == 1 then continue;
var r2 = new R(100);
if i == 2 then continue;
var r3 = new R(1000);
if i == 3 then continue;
var r4 = new R(10000);
}
writeln(ic);
writeln(dc);
writeln(iv);
writeln(dv);
Should output
35
35
56789
56789
but with the bug the second line is 45 and the last line is nondeterministic.
@vasslitvinov : Should/could we be using a defer
for those deinits to help make sure they're executed whether a continue is taken or not? Or is the case that we are using a defer
and its implementation is the thing that's broken?
@bradcray this is what I am investigating.
@cassella - thanks for trimming down the original example! At this point it is easier for me to look at the internal representation directly and simplify the source program to the point where it makes no sense. For example:
record R {
inline proc init() { writeln(12345678); }
inline proc deinit() { writeln(87654321); }
}
config const skip = true;
proc main {
forall b in these2()
{
if skip then continue;
var todo: R;
}
}
iter these2() do yield 5;
// this iterator is invoked by the above 'forall'
iter these2(param tag: iterKind) do
for chunk in these3() do
yield 222222;
iter these3() do yield 333333;
produces something like
static void chpl_user_main() {
int these3_idx = 333333;
int these2_idx = 222222;
if (skip) {
goto _continueLabel;
}
writeln(12345678); // initializer
_continueLabel:;
writeln(87654321); // deinitializer
return;
}
At this point it is easier for me to look at the internal representation directly and simplify the source program to the point where it makes no sense.
Sure. I just meant that last case with the 5 R
's seems like a good test to add when the bug is fixed.
Snapshot: I am looking at the code in https://github.com/chapel-lang/chapel/issues/21292#issuecomment-1368194922 . It is a different issue - heap use after free. Here is a simple reproducer.
// reproducer updated 2/10
proc IndexType() type {
var AA: [1..8] real;
return AA.type;
// this returns AA's runtime type, which contains AA.domain
// however, AA and AA.domain get deallocated upon return
}
var BB: IndexType();
@vasslitvinov - it looks like you may have fixed this in #24340 and removed the .future file in #24394. Should this issue be closed?
@lydia-duncan I have resolved the issue in the OP. The separate issue brought up by Damian is still not addressed, see my most recent comment from Feb 6. Therefore I suggest leaving this issue open
Summary of Problem
The code below nondeterministically attempts to dereference nil, or hits a LocaleModel error, or just segfaults.
Some of the other variants hit just the dereference nil error, or even work, as noted.
Modifying the code to pick one of the variants based on a config var makes the problem go away.
Steps to Reproduce
Source Code:
Execution command:
[edit: this is repeated invocation of the same binary. ]
Associated Future Test(s):
test/parallel/taskPar/nested/forall-with-continue-forall-bracket-expr.chpl
#21411test/parallel/taskPar/nested/forall-with-continue-forall-expr.chpl
#21411test/parallel/taskPar/nested/forall-with-continue-reduction-over-forall-bracket-expr.chpl
#21411Configuration Information
This is with
CHPL_COMM=none
for clarity. It also hit the problem withCHPL_COMM=gasnet
.