chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.78k stars 418 forks source link

within a forall loop containing a taken continue, a forall loop over a domain literal dereferences nil or hits a LocaleModel error or just segfaults #21292

Open cassella opened 1 year ago

cassella commented 1 year ago

Summary of Problem

The code below nondeterministically attempts to dereference nil, or hits a LocaleModel error, or just segfaults.

Some of the other variants hit just the dereference nil error, or even work, as noted.

Modifying the code to pick one of the variants based on a config var makes the problem go away.

Steps to Reproduce

Source Code:

var D = {-1..1,-1..1};
var A: [D] bool;

A[(0,0)] = true;

//  writeln(+ reduce ([x in {-1..1}] x)); // works here

forall Exy in D {
  if !A[Exy] then continue;
  var neighbours = 0;

  //writeln(+ reduce ([x in {-1..1}] x)); // nil

  //  writeln(+ reduce ([xy in D] xy)); // works

  //writeln(+ reduce ([xy in {-1..1}] xy)); // nil

  writeln(+ reduce ([xy in {-1..1,-1..1}] xy)); // nil, or child from flat

  //  neighbours = + reduce ([xy in {-1..1,-1..1}] A[Exy+xy]); // nil

  // forall xy in {-1..1,-1..1} do writeln(A[Exy+xy]); // works

  // neighbours = + reduce ([xy in D] A[Exy+xy]);      // works

  //  writeln(+ reduce ({-1..1,-1..1})); // works

  writeln(neighbours);
}

Execution command:

[edit: this is repeated invocation of the same binary. ]

fortytwo@enodia:~/src/chapel (main)$ ./foo 
$CHPL_HOME/modules/internal/ChapelDomain.chpl:1069: error: attempt to dereference nil
Segmentation fault (core dumped)

fortytwo@enodia:~/src/chapel (main)$ ./foo 
Segmentation fault (core dumped)

fortytwo@enodia:~/src/chapel (main)$ ./foo 
$CHPL_HOME/modules/internal/localeModels/flat/LocaleModel.chpl:122: error: halt reached - requesting a child from a flat LocaleModel locale
Segmentation fault (core dumped)
$CHPL_HOME/modules/internal/ChapelDomain.chpl:1069: error: attempt to dereference nil
[Switching to Thread 0x7fffeffff640 (LWP 526700)]

Thread 4 "foo" hit Breakpoint 1, gdbShouldBreakHere () at gdb.c:28
28      void gdbShouldBreakHere(void) {printf("%s", "");}
(gdb) i s
#0  gdbShouldBreakHere () at gdb.c:28
#1  0x00005555555ee74b in chpl_exit_common (status=1, all=0) at chplexit.c:38
#2  0x00005555555ee7a5 in chpl_exit_any (status=1) at chplexit.c:60
#3  0x00005555555ec18c in chpl_error_explicit (message=0x5555556924a4 "attempt to dereference nil", lineno=1069, filename=0x5555556900a8 "$CHPL_HOME/modules/internal/ChapelDomain.chpl")
    at error.c:366
#4  0x00005555555ec280 in chpl_error (message=0x5555556924a4 "attempt to dereference nil", lineno=1069, filenameIdx=58) at error.c:440
#5  0x00005555555e4c3f in chpl_check_nil ()
#6  0x000055555558585d in _do_destroy_chpl ()
#7  0x0000555555585aee in deinit_chpl18 ()
#8  0x000055555558063e in chpl__autoDestroy3 ()
#9  0x00005555555e3486 in coforall_fn_chpl11 ()
#10 0x00005555555e3618 in wrapcoforall_fn_chpl11 ()
#11 0x00005555555f436d in chapel_wrapper (arg=0x7ffff5763710) at tasks-qthreads.c:800
#12 0x000055555565b1ae in qthread_wrapper (ptr=0x7ffff57636d0) at /home/fortytwo/src/chapel/third-party/qthread/qthread-src/src/qthread.c:2184
$CHPL_HOME/modules/internal/localeModels/flat/LocaleModel.chpl:122: error: halt reached - requesting a child from a flat LocaleModel locale
[Switching to Thread 0x7ffff61ff640 (LWP 526718)]

Thread 2 "foo" hit Breakpoint 1, gdbShouldBreakHere () at gdb.c:28
28      void gdbShouldBreakHere(void) {printf("%s", "");}
(gdb) i s
#0  gdbShouldBreakHere () at gdb.c:28
#1  0x00005555555ee74b in chpl_exit_common (status=1, all=0) at chplexit.c:38
#2  0x00005555555ee7a5 in chpl_exit_any (status=1) at chplexit.c:60
#3  0x00005555555ec18c in chpl_error_explicit (message=0x7ffff5754190 "halt reached - requesting a child from a flat LocaleModel locale", lineno=122, 
    filename=0x55555568fa30 "$CHPL_HOME/modules/internal/localeModels/flat/LocaleModel.chpl") at error.c:366
#4  0x00005555555ec280 in chpl_error (message=0x7ffff5754190 "halt reached - requesting a child from a flat LocaleModel locale", lineno=122, filenameIdx=25) at error.c:440
#5  0x00005555555c7080 in halt_chpl14 ()
#6  0x00005555555c647d in halt_chpl ()
#7  0x00005555555a991b in _getChild_chpl3 ()
#8  0x000055555558268d in remove_chpl2 ()
#9  0x000055555558586c in _do_destroy_chpl ()
#10 0x0000555555585aee in deinit_chpl18 ()
#11 0x000055555558063e in chpl__autoDestroy3 ()
#12 0x00005555555e3486 in coforall_fn_chpl11 ()
#13 0x00005555555e3618 in wrapcoforall_fn_chpl11 ()
#14 0x00005555555f436d in chapel_wrapper (arg=0x7ffff5761040) at tasks-qthreads.c:800
#15 0x000055555565b1ae in qthread_wrapper (ptr=0x7ffff5761000) at /home/fortytwo/src/chapel/third-party/qthread/qthread-src/src/qthread.c:2184
#16 0x0000000000000000 in ?? ()
$CHPL_HOME/modules/internal/ChapelDomain.chpl:1069: error: attempt to dereference nil

Thread 3 "foo" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff4dff640 (LWP 526823)]
0x0000555555582670 in remove_chpl2 ()
(gdb) i s
#0  0x0000555555582670 in remove_chpl2 ()
#1  0x000055555558586c in _do_destroy_chpl ()
#2  0x0000555555585aee in deinit_chpl18 ()
#3  0x000055555558063e in chpl__autoDestroy3 ()
#4  0x00005555555e3486 in coforall_fn_chpl11 ()
#5  0x00005555555e3618 in wrapcoforall_fn_chpl11 ()
#6  0x00005555555f436d in chapel_wrapper (arg=0x7ffff5761490) at tasks-qthreads.c:800
#7  0x000055555565b1ae in qthread_wrapper (ptr=0x7ffff5761450) at /home/fortytwo/src/chapel/third-party/qthread/qthread-src/src/qthread.c:2184

Associated Future Test(s): test/parallel/taskPar/nested/forall-with-continue-forall-bracket-expr.chpl #21411 test/parallel/taskPar/nested/forall-with-continue-forall-expr.chpl #21411 test/parallel/taskPar/nested/forall-with-continue-reduction-over-forall-bracket-expr.chpl #21411

Configuration Information

chpl version 1.30.0 pre-release (b'179b81edd3')
  built with LLVM version 14.0.0

CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native *
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none *
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: none
CHPL_LLVM: system
CHPL_AUX_FILESYS: none

gcc (Ubuntu 11.3.0-1ubuntu1~22.04) 11.3.0

Ubuntu clang version 14.0.0-1ubuntu1

This is with CHPL_COMM=none for clarity. It also hit the problem with CHPL_COMM=gasnet.

damianmoz commented 1 year ago

I think you have found a simplification of a problem I have seen. Consider this partial reduction

proc partial
{
    var A : [1..8, 1..8] real;
    var x : [1..8] real; 

    for i in 1..8 do // fill A with some data
    { 
        for j in 1..8 do
        {
            A[i, j] = (4 - i):real + ((5 - j) * 2):real;
        } 
    }
    x = + scan (A[1, ..]); // likewise with 'x'
    writeln(x);

    // 3 IMPORTANT LINES
    const u = + reduce [i in 1..8] x[i] * A[i, ..]; // partial reduction
    var y : [1..8] real = 0.0; // build reduction using reduction intent
    var z = y; // build reduction into 'z' manually

    for r in 1..8 do
        z += x[r] * A[r, ..];

    [ (r, c) in A.domain with (+ reduce y) ] y[c] += x[r] * A[r, c];

    writeln('partial reduction:');
    writeln('- manual: ', z);
    writeln('- intent: ', y);
    writeln('- direct: ', u);
}

This crashes with the error

m.chpl:16: error: halt reached - argument to ! is nil

Moving the partial reduction two lines down within the THREE IMPORTANT LINES and the identical reduction code now works!

    var y : [1..8] real = 0.0; // build reduction using reduction intent
    var z = y; // build reduction into 'z' manually
    const u = + reduce [i in 1..8] x[i] * A[i, ..]; // partial reduction

Or removing from the reduction the complexity of the scalar multiplication of the row of the matrix also works!!

    inline proc row(m, v) return m * v;
    const u = + reduce [r in 1..8] row(x[r], A[r, ..]); // partial reduction
    var y : [1..8] real = 0.0; // build reduction using reduction intent
    var z = y; // build reduction into 'z' manually

You have a slightly different test case to help you track down (and correct) what looks like much the same bug.

I suggest you try the code with the same (later) version of the compiler that you are using. I have vanilla 1.29.0 with clang14.

cassella commented 1 year ago

It's not directly related to reductions: I also get a dereference nil error from this loop body:

  writeln([xy in {-1..1,-1..1}] xy);
cassella commented 1 year ago

My example is also sensitive to the continue -- without that I don't get errors for at least that last form or for the uncommented-out loop body in the initial comment.

cassella commented 1 year ago

With this body

  writeln(forall xy in {-1..1,-1..1} do xy); // nil
$CHPL_HOME/modules/internal/ChapelDomain.chpl:1069: error: attempt to dereference nil

which is the inst.remove() in

    proc _do_destroy () {
      if ! _unowned {
        on _instance {
          // Count the number of arrays that refer to this domain,
          // and mark the domain to be freed when that number reaches 0.
          // Additionally, if the number is 0, remove the domain from
          // the distribution and possibly get the distribution to free.
          const inst = _instance;
          var (domToFree, distToRemove) = inst.remove();
          var distToFree:unmanaged BaseDist? = nil;
          if distToRemove != nil {
            distToFree = distToRemove!.remove();
          }
          if domToFree != nil then
            _delete_dom(inst, _isPrivatized(inst));
          if distToFree != nil then
            _delete_dist(distToFree!, _isPrivatized(inst.dist));
        }

(That's the same line as some of the earlier cases hit, but which I didn't look into before.)

Initializing A = true so that the continue is never taken makes the problem go away.

Making each of the loop body options depend on a config param also makes the problem go away.

cassella commented 8 months ago

Last year it was day 23, but this year I hit it on aoc day 22. This is more than a little different, but similar enough I'll just note it here for now.

This similarly involves a segfault after a taken continue from a forall loop.

use Set;

var sortedBlocks = [ i in 1..7 ] i;

config var skip = false;

forall b in 1..10 {

  if skip then continue;

  var todo: set(int);
}

Run without --skip, no segfault.

If todo is just an int, no segfault.

If the todo declaration is ahead of the continue, no segfault.

Would you believe without the sortedBlocks line the segfault goes away?

Thread 2 "segfault" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7ffff61ff640 (LWP 3878427)]
chpl_je_arena_mapbits_get (pageind=509, chunk=<optimized out>) at /home/fortytwo/src/chapel/third-party/jemalloc/jemalloc-src/include/jemalloc/internal/arena.h:809
809             return (arena_mapbitsp_read(arena_mapbitsp_get_const(chunk, pageind)));

(gdb) i s
#0  chpl_je_arena_mapbits_get (pageind=509, chunk=<optimized out>) at /home/fortytwo/src/chapel/third-party/jemalloc/jemalloc-src/include/jemalloc/internal/arena.h:809
#1  chpl_je_arena_dalloc (ptr=0x3ebdfdee8, tcache=0x7ffff6252000, slow_path=false, tsdn=<optimized out>)
    at /home/fortytwo/src/chapel/third-party/jemalloc/jemalloc-src/include/jemalloc/internal/arena.h:1434
#2  chpl_je_idalloctm (ptr=0x3ebdfdee8, tcache=0x7ffff6252000, slow_path=false, tsdn=<optimized out>, is_metadata=<optimized out>) at include/jemalloc/internal/jemalloc_internal.h:1170
#3  chpl_je_iqalloc (ptr=0x3ebdfdee8, tcache=0x7ffff6252000, slow_path=false, tsd=<optimized out>) at include/jemalloc/internal/jemalloc_internal.h:1187
#4  ifree (tsd=<optimized out>, ptr=0x3ebdfdee8, tcache=0x7ffff6252000, slow_path=false) at /home/fortytwo/src/chapel/third-party/jemalloc/jemalloc-src/src/jemalloc.c:1896
#5  0x0000555555600c72 in chpl_free ()
#6  0x00005555555ff5fd in chpl_mem_array_free ()
#7  0x00005555555870c0 in _freeData_chpl ()
#8  0x0000555555588279 in deinit_chpl16 ()
#9  0x00005555555fe374 in coforall_fn_chpl15 ()
#10 0x00005555555fe4b9 in wrapcoforall_fn_chpl15 ()
#11 0x000055555560e97a in chapel_wrapper (arg=0x7ffff5762040) at tasks-qthreads.c:819
#12 0x0000555555696a7e in qthread_wrapper (ptr=0x7ffff5762000) at /home/fortytwo/src/chapel/third-party/qthread/qthread-src/src/qthread.c:2194
#13 0x0000000000000000 in ?? ()
chpl version 1.34.0 pre-release (73c352c89e)
  built with LLVM version 14.0.0

gcc (Ubuntu 11.4.0-1ubuntu1~22.04) 11.4.0

Ubuntu clang version 14.0.0-1ubuntu1.1
Target: x86_64-pc-linux-gnu

CHPL_TARGET_PLATFORM: linux64
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native *
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none *
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: none
CHPL_LLVM: system
CHPL_AUX_FILESYS: none
cassella commented 8 months ago

Moving the partial reduction two lines down within the THREE IMPORTANT LINES and the identical reduction code now works!

FWIW, I get the same error as you reported with the partial reduction line in either location.

vasslitvinov commented 7 months ago

As a snapshot, the following simpler program hits a nil dereference error:

proc main {
  forall Exy in these2() {
    if Exy == 0 then continue;  // nil dereference is reported for this line

    var D = {-1..1};
    var sum = 0;
    [y in ([x in these4()] x) with (+ reduce sum)] {
      sum += y;
    }
  }
}

iter these2() do yield 5;

iter these2(param tag: iterKind)
{
  // this gives ASAN error:
  //coforall chunk in these3()
  // this gives a nil dereference:
  for chunk in these3()
  {
    yield 0;
    yield 1;
  }
}

iter these3() do yield 0;

iter these4() do yield 0;
cassella commented 7 months ago

On that I get the nil dereference on that same inst.remove() line as I saw last January.

FYI, in case you skimmed past it, the reproducer from this last December may be a simpler starting point.

cassella commented 7 months ago

I don't know why I didn't try this before. With CHPL_LLVM=none chpl --savec savec foo.chpl, the reproducer from this past December has this really suspect bit

static void coforall_fn_chpl13(int64_t len_chpl,
                               int64_t numChunks_chpl,
                               range_int64_t_both_one_chpl this_chpl7,
                               chpl___EndCount_AtomicT_int64_t_int64_t _coforallCount_chpl,
                               int64_t chunk_chpl,
                               chpl_bool skip_chpl2) {
...
  for (i_chpl = tmp_x0_chpl; ((i_chpl <= _ic__F1_high_chpl)); i_chpl += INT64(1)) {
    if (skip_chpl2) {
      goto _continueLabel_chpl;
    }
    init_chpl100(&todo_chpl, local_defaultHashTableResizeThreshold_chpl, INT64(16));
    _continueLabel_chpl:;
    i_x_chpl = &todo_chpl;
    _field_destructor_tmp__chpl = &((i_x_chpl)->_htb);
    deinit_chpl16(_field_destructor_tmp__chpl);
  }
  return;
}

where the continue's goto skips todo's initialization but not its deinit, if I'm following

cassella commented 7 months ago

This is probably a better test case, since it doesn't rely on the memory error happening to cause trouble down the line,

var count: atomic int;

record R {
  proc init() { count.add(1); }
  proc deinit() { count.add(-1); }
}

config var skip = true;

forall b in 1..10 {

  if skip then continue;

  var todo: R;
}

writeln(count.read());

Currently:

$ ./foo -nl 1 
-10
$ ./foo -nl 1 --skip=false
0
vasslitvinov commented 7 months ago

Thanks Paul!

cassella commented 7 months ago

I think this would be a more complete test, showing the right inits+deinits are skipped:

var ic, dc: atomic int;
var iv, dv: atomic int;

record R {
  var myval: int;
  proc init(x) { myval = x; iv.add(myval); ic.add(1); }
  proc deinit() { dv.add(myval); dc.add(1); }
}

config var skip = true;

forall i in 0..8 {

  var r0 = new R(1);

  if i == 0 then continue;

  var r1 = new R(10);

  if i == 1 then continue;

  var r2 = new R(100);

  if i == 2 then continue;

  var r3 = new R(1000);

  if i == 3 then continue;

  var r4 = new R(10000);
}

writeln(ic);
writeln(dc);

writeln(iv);
writeln(dv);

Should output

35
35
56789
56789

but with the bug the second line is 45 and the last line is nondeterministic.

bradcray commented 7 months ago

@vasslitvinov : Should/could we be using a defer for those deinits to help make sure they're executed whether a continue is taken or not? Or is the case that we are using a defer and its implementation is the thing that's broken?

vasslitvinov commented 7 months ago

@bradcray this is what I am investigating.

@cassella - thanks for trimming down the original example! At this point it is easier for me to look at the internal representation directly and simplify the source program to the point where it makes no sense. For example:

record R {
  inline proc init()   { writeln(12345678); }
  inline proc deinit() { writeln(87654321); }
}

config const skip = true;

proc main {
  forall b in these2()
  {
    if skip then continue;
    var todo: R;
  }
}

iter these2() do yield 5;

// this iterator is invoked by the above 'forall'
iter these2(param tag: iterKind) do
  for chunk in these3() do
    yield 222222;

iter these3() do yield 333333;

produces something like

static void chpl_user_main() {
  int these3_idx = 333333;
  int these2_idx = 222222;
  if (skip) {
    goto _continueLabel;
  }
  writeln(12345678); // initializer
  _continueLabel:;
  writeln(87654321); // deinitializer
  return;
}
cassella commented 7 months ago

At this point it is easier for me to look at the internal representation directly and simplify the source program to the point where it makes no sense.

Sure. I just meant that last case with the 5 R's seems like a good test to add when the bug is fixed.

vasslitvinov commented 7 months ago

Snapshot: I am looking at the code in https://github.com/chapel-lang/chapel/issues/21292#issuecomment-1368194922 . It is a different issue - heap use after free. Here is a simple reproducer.

// reproducer updated 2/10

proc IndexType() type {
  var AA: [1..8] real;
  return AA.type;
  // this returns AA's runtime type, which contains AA.domain
  // however, AA and AA.domain get deallocated upon return
}

var BB: IndexType();
previous reproducers ```chpl var A : [1..8, 1..8] real; proc main { const r = + reduce (for IDX in IT() do A[IDX, ..]); } iter IT() { yield 3; } ``` The implementation invokes proc iteratorIndexType() so the above reduces to: ```chpl var A : [1..8, 1..8] real; iter IT() do yield 3; type itype = iteratorIndexType( for IDX in IT() do A[IDX, ..] ); // at this point the runtime component of `itype`, // which is an array slice, has been deinitialized var B: itype; proc iteratorIndexType(x) type { for i in x do return i.type; halt("the iterator yields no elements, cannot determine its index type"); } ```
lydia-duncan commented 1 month ago

@vasslitvinov - it looks like you may have fixed this in #24340 and removed the .future file in #24394. Should this issue be closed?

vasslitvinov commented 2 weeks ago

@lydia-duncan I have resolved the issue in the OP. The separate issue brought up by Damian is still not addressed, see my most recent comment from Feb 6. Therefore I suggest leaving this issue open