Split initialization across cobegins

stonea commented 2 years ago

Summary of Problem

Many programs start by initializing and preprocessing some data-structures before moving on to some later step that makes use of them. Depending on the specifics it may be advantageous to try and use task parallelism like the following:

const a,b;
cobegin {
  a = initializeA();
  b = initializeB();
}

Unfortunately split initialization doesn't seem to work across cobegin statements. Compiling this will result in an error like:

cannot assign to const variable

If I modify the example so that a and b are vars rather than consts I get an even more confusing error:

internal error: Unhandled type in blankIntentForType() [passes/resolveIntents.cpp:168]

Steps to Reproduce

Source Code:

var a,b;
proc foo() { return 42; }
cobegin {
  a = foo();
  b = foo();
}

Compile command: chpl foo.chpl

Associated Future Test(s):

(to be created)

Configuration Information

$ chpl --version
chpl version 1.27.0 pre-release (a4373b47a9)
  built with LLVM version 11.1.0
Copyright 2020-2022 Hewlett Packard Enterprise Development LP
Copyright 2004-2019 Cray Inc.
(See LICENSE file for more details)

$ $CHPL_HOME/util/printchplenv --anonymize
CHPL_TARGET_PLATFORM: darwin
CHPL_TARGET_COMPILER: llvm
CHPL_TARGET_ARCH: x86_64
CHPL_TARGET_CPU: native
CHPL_LOCALE_MODEL: flat
CHPL_COMM: none
CHPL_TASKS: qthreads
CHPL_LAUNCHER: none
CHPL_TIMERS: generic
CHPL_UNWIND: none
CHPL_MEM: jemalloc
CHPL_ATOMICS: cstdlib
CHPL_GMP: bundled
CHPL_HWLOC: bundled
CHPL_RE2: bundled
CHPL_LLVM: system
CHPL_AUX_FILESYS: none

$ clang --version
Apple clang version 12.0.0 (clang-1200.0.32.29)
Target: x86_64-apple-darwin20.6.0
Thread model: posix
InstalledDir: /Library/Developer/CommandLineTools/usr/bin

bradcray commented 2 years ago

I like this proposal, and it's something that I, and users, have wanted when using cobegin in the past.

One slightly tricky thing I think we'll need to be wary of in implementing it is making sure there aren't multiple statements that initialize or modify the variables in question. For example:

var a, b;
cobegin {
  a = foo();
  a = bar();
}

In a sequential block of code, the first statement would be an initialization of a and the third an assignment to it. Since this is a cobegin, I think we'd have to say that this wasn't permitted since we don't know which will execute first. This also applies to out and ref arguments:

cobegin {
  a = foo();
  bar(myRefArg=a);  // not legal; a may not be initialized yet
  baz(myOutArg=a);  // not legal since we don't know whether foo() or baz() will execute first

mppf commented 2 years ago

@bradcray - we could consider either making the code examples in the previous comment a) an outright compilation error or b) simply disable split init. IMO it should be an outright error (because it is likely to be a race condition even if split init is not used).

Another related issue is what happens for on statements (certainly on can be combined with cobegin). I think it might be reasonable to allow split-init across on blocks — that is discussed in #15808. Certainly that is another issue but I think we should think about it while making improvements in this area of the language.

bradcray commented 2 years ago

because it is likely to be a race condition even if split init is not used

The mention of the race condition reminds me that the approach for implementing this feature probably depends heavily on the types of a and b and their default task intents. For example:

var a, b: int;

cobegin {
  a = 1;
  b = 2;
}

writeln((a,b));

doesn't compile today, even without split init, because the default intent for int is const in, so they can't be reassigned. That suggests that to support split init for this case, we'd have to use one of:

cobegin with (out a, b) {

or maybe:

cobegin with (ref a, b) {   // is it OK to pass a ref to an uninitialized variable, or will this force default init?

Of course if a and b were types that used ref intent, like atomic, sync, or array, then this issue doesn't come up.

mppf commented 2 years ago

Yeah that's really interesting and points to a situation where the features aren't combining seamlessly. As a result, it'd be my preference to require cobegin with (out a,b) to enable the split init. And, at that point, it's not a breaking language change at all. (Which is a good thing).

I wonder if on with (out a) would be a way to solve the split init part of #15808 without making a breaking language change as well. I'll post something there.

mppf commented 2 years ago

I think we could also have an in intent for cobegin to enable copy elision:

proc computeSomeRecord(): SomeRecord { ... }
proc consumeSomeRecord(in arg:SomeRecord) { ... }

{
  var a = computeSomeRecord();
  var b = computeSomeRecord();
  cobegin with (in a, b) {
    consumeSomeRecord(a);
    consumeSomeRecord(b);
  }
}

In the above, if we want, we could arrange for the language to avoid any copies (since a can flow from the declaration -> cobegin in intent -> in function argument).

Anyway, for both the in and out intent proposals here with cobegin, we need a property that split-init/copy-elision is only possible if the variable mentioned in the intent is mentioned in exactly one branch of the cobegin.

chapel-lang / chapel