StanfordLegion / legion

The Legion Parallel Programming System
https://legion.stanford.edu
Apache License 2.0
680 stars 145 forks source link

Regent: bug in returning structs on PowerPC #1276

Closed mariodirenzo closed 2 years ago

mariodirenzo commented 2 years ago

The following script runs successfully on any system where I have tested it aside from PowerPC (Lassen)

import "regent"

local TYPES = terralib.includecstring([[
   #include <assert.h>
   #include <stdio.h>

   struct input {
      int a;
      int b;
      int c;
   };

   struct output {
      double a;
      double b;
      double c;
      //int dummy; // This struct member fixes the issue
   };

   void printInp(const struct input* in) {
      printf("printInp %d\n", in->a);
      printf("printInp %d\n", in->b);
      printf("printInp %d\n", in->c);
      assert(in->a == 1);
      assert(in->b == 2);
      assert(in->c == 3);
   };

   struct output initOut(const struct input* in) {
      printf("initOut %d\n", in->a);
      printf("initOut %d\n", in->b);
      printf("initOut %d\n", in->c);
      assert(in->a == 1);
      assert(in->b == 2);
      assert(in->c == 3);

      struct output o;
      return o;
   };
]])

__demand(__inline)
task InitOut(inp : TYPES.input)
   TYPES.printInp(&inp)
   var out = TYPES.initOut(&inp)
end

task main()
   var inp : TYPES.input
   inp.a = 1
   inp.b = 2
   inp.c = 3

   InitOut(inp)

end
regentlib.start(main)

In particular, it fails with an output that looks like

printInp 1
printInp 2
printInp 3
initOut 1344285536
initOut 8192
initOut 10
terra: <buffer>:30: struct output initOut(const struct input *): Assertion `in->a == 1' failed.

It is interesting that, if the member dummy is included in the structure output, the script runs successfully on Lassen as well.

The script should be executed with $LEGION_DIR/language/regent.py test.rg and a successful output looks like

printInp 1
printInp 2
printInp 3
initOut 1
initOut 2
initOut 3

Legion is installed with the command LEGION_DIR/language/scripts/setup_env.py --llvm-version 130 and with the following environment variables

# Module loads
module load StdEnv
module load gcc/8.3.1
module load cuda/11.1.1
module load cmake
module load python/3.8.2
# Build config
export CC=gcc
export CXX=g++
export CONDUIT=ibv
# CUDA config
export CUDA_HOME=/usr/tce/packages/cuda/cuda-11.1.1
export CUDA="$CUDA_HOME"
export GPU_ARCH=volta
# Legion setup
export USE_CUDA=1
export USE_OPENMP=1
export USE_GASNET=1
export USE_HDF=1
export MAX_DIM=3
export REALM_NETWORKS="gasnetex"
elliottslaughter commented 2 years ago

I think this is a known failure mode for Terra on PowerPC. I'll look into it, but in the mean time, can you try the following workaround?

import "regent"

local TYPES = terralib.includecstring([[
   #include <assert.h>
   #include <stdio.h>

   struct input {
      int a;
      int b;
      int c;
   };

   struct output {
      double a;
      double b;
      double c;
      //int dummy; // This struct member fixes the issue
   };

   void printInp(const struct input* in) {
      printf("printInp %d\n", in->a);
      printf("printInp %d\n", in->b);
      printf("printInp %d\n", in->c);
      assert(in->a == 1);
      assert(in->b == 2);
      assert(in->c == 3);
   };

   void initOut(const struct input* in, struct output *out) {
      printf("initOut %d\n", in->a);
      printf("initOut %d\n", in->b);
      printf("initOut %d\n", in->c);
      assert(in->a == 1);
      assert(in->b == 2);
      assert(in->c == 3);
   };
]])

__demand(__inline)
task InitOut(inp : TYPES.input)
   TYPES.printInp(&inp)
   var out : TYPES.output
   TYPES.initOut(&inp, &out)
end

task main()
   var inp : TYPES.input
   inp.a = 1
   inp.b = 2
   inp.c = 3

   InitOut(inp)

end
regentlib.start(main)

Basically, on PowerPC you cannot rely on returning structs by value from functions. We worked around this in Regent by always passing output structs by pointer.

mariodirenzo commented 2 years ago

I see... Potentially I would be more inclined to keep the dummy variable in the structure and maybe get rid of it in the future. Is it a stable workaround? Does any particular size/number of struct members avoid the issue? Otherwise, I'll try to implement that workaround in the code, though keeping those struct members as const is quite essential to prevent mistakes in the future.

elliottslaughter commented 2 years ago

I do not think the use of dummy is a stable workaround. I believe there are fundamental issues in Terra's implementation of the PPC64le ABI which are just broken, and I cannot guarantee (without putting in the effort to actually fix this) that any such use of structs will be stable.

I think const is fine. Terra will ignore it, but you can use it on the C/C++ side code to communicate and enforce which arguments are input and which are output.

In the mean time, I can try to take a look at the ABI in more detail, but I'm not sure right now how easy a fix we're looking at. However, I fixed similar issues for AMD GPU recently, so it may possible.

elliottslaughter commented 2 years ago

Just for posterity, here's a version that replicates in pure Terra:

local TYPES = terralib.includecstring([[
   #include <assert.h>
   #include <stdio.h>

   struct input {
      int a;
      int b;
      int c;
   };

   struct output {
      double a;
      double b;
      double c;
      //int dummy; // This struct member fixes the issue
   };

   void printInp(const struct input* in) {
      printf("printInp %d\n", in->a);
      printf("printInp %d\n", in->b);
      printf("printInp %d\n", in->c);
      assert(in->a == 1);
      assert(in->b == 2);
      assert(in->c == 3);
   };

   struct output initOut(const struct input* in) {
      printf("initOut %d\n", in->a);
      printf("initOut %d\n", in->b);
      printf("initOut %d\n", in->c);
      assert(in->a == 1);
      assert(in->b == 2);
      assert(in->c == 3);

      struct output o;
      return o;
   };
]])

terra InitOut(inp : TYPES.input)
   TYPES.printInp(&inp)
   var out = TYPES.initOut(&inp)
end

terra main()
   var inp : TYPES.input
   inp.a = 1
   inp.b = 2
   inp.c = 3

   InitOut(inp)

end
main()

Output:

printInp 1
printInp 2
printInp 3
initOut 0
initOut 0
initOut 0
terra: <buffer>:30: struct output initOut(const struct input *): Assertion `in->a == 1' failed.
elliottslaughter commented 2 years ago

Please try the latest Terra master and let me know if it fixes this.

You can see the variety of structs we're testing against at the top of this file. All of these pass on PPC64le:

https://github.com/terralang/terra/blob/master/tests/cconv_more.t

elliottslaughter commented 2 years ago

@mariodirenzo has told me this is fixed. Closing.

elliottslaughter commented 2 years ago

Reopening. The issue is fixed but there are some more advanced tests I've had fail in Terra and I want to clean that up before doing a release.

elliottslaughter commented 2 years ago

Terra 1.0.2 includes substantially improved support for structs on PPC64le, and I wrote a conformance test to confirm that this works on all structs up to 23 fields for all primitive types (uint8, int16, int32, int64, float, double, and arrays of those types). The new Terra passes this test (as well as Terra's older calling convention test) so I'm highly confident we've addressed all the issues.

elliottslaughter commented 2 years ago

Oh, and setup_env.py has bumped Terra to 1.0.2.