spawn+VLA corrupts stack

VoxSciurorum commented 4 years ago

This crashes with various forms of stack corruption when not run serially.

#include <cilk/cilk.h>
#include <stdlib.h>
#include <stdio.h>
#include <string.h>
#include <unistd.h>

void fill(unsigned char *data, unsigned length, unsigned char seed)
{
  for (unsigned i = 0; i < length; ++i)
    data[i] = seed + i;
}

unsigned check(unsigned char *data, unsigned length, unsigned char seed)
{
  fprintf(stdout, "check %p seed %d\n", data, seed);
  for (unsigned i = 0; i < length; ++i)
    if (data[i] != (unsigned char)(seed + i))
    {
      fprintf(stdout, "check %p seed %d [%u] %d != %d\n", data, seed, i,
          data[i], (unsigned char)(seed + i));
      return 1;
    }
  return 0;
}

unsigned work(unsigned char *data, unsigned length, unsigned char seed)
{
  fprintf(stdout, "work %p seed %d\n", data, seed);
  fflush(stdout);
  fill(data, length, seed);
  usleep(100);
  return check(data, length, seed);
}

unsigned loop(unsigned length)
{
  unsigned errors[length];
  memset(errors, 0, sizeof errors);
  for (unsigned int i = 0; i < 100; ++i)
  {
    unsigned char vla[length];
    cilk_spawn errors[i] = work(vla, length, i);
  }
  cilk_sync;
  unsigned sum = 0;
  for (unsigned int i = 0; i < 100; ++i)
    sum += errors[i];
  return sum;
}

int main(int argc, char *argv[])
{
  unsigned length = 0;
  if (argc > 1)
    length = strtoul(argv[1], 0, 0);
  if (length == 0)
    length = 399;
  unsigned errors = loop(length);
  return errors != 0;
}

neboat commented 4 years ago

This code has a race in the loop() function between the spawned execution of work(), which uses the VLA vla, and the implicit free of vla at the end of the loop body. Those races could certainly lead to stack corruption during parallel execution.

VoxSciurorum commented 4 years ago

Yes, the code has a bug but I want to keep this open to figure out why I didn't get a report out of the race detector.

neboat commented 4 years ago

Yes, let's keep this issue open to investigate that issue.

I suspect we're missing some necessary instrumentation for Cilksan to find this race. Since Cilksan uses compiler instrumentation, we should be able to spot such an issue at the LLVM IR level. That's good news, since LLVM IR is generally easier to read than assembly, especially when using debug builds of the compiler.

For reference, you can ask the compiler to generate LLVM IR for the instrumented code by adding the compiler flags -S -emit-llvm -ftapir=none to the normal clang command for compiling the test case. Broadly speaking, in the LLVM IR, the Cilksan instrumentation within a function shows up as calls to functions that start with __csan or __csi.

By the way, in case you need it, there are Emacs and Vim modes for LLVM IR in the opencilk-project repo, in the subdirectory llvm/utils.

neboat commented 3 years ago

This issue has been resolved in version 1.0.

OpenCilk / opencilk-project

spawn+VLA corrupts stack #12