chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.76k stars 413 forks source link

Should forall loops over default arrays of locales result in coforall + on implementation by default? #25147

Open bradcray opened 1 month ago

bradcray commented 1 month ago

For some time, we've wished with some frequency for forall loc in Locales or forall loc in MyTargetLocalesArray to result in SPMD execution equivalent to coforall loc in [Locales|MyTargetLocalesArray] do on loc. This issue asks whether that's something we'd just want to do for default arrays whose element type happens to be locale.

When we talked about this several years ago, the main approach we discussed was a special domain map that would be used for the built-in Locales array and presumably any slices/reshapings/etc. of it. The idea was that this new domain map's default parallel iterators would simply use the coforall + on idiom to implement any forall loops over the array, and that as long as a user was careful to preserve this domain map in such cases, they'd get the SPMD style of execution. We took a stab at implementing this, but it was unsuccessful, potentially simply due to the language not being as fully baked at the time.

While thinking about the question again recently, I realized that an alternate, and much simpler, approach might be to simply have the default array's parallel iterator use the coforall+on algorithm whenever its eltType is locale. This turns out to be quite simple to implement for the standalone parallel iterator, whose body becomes:

      if eltType == locale {
        coforall i in dom {
          on dsiAccess(i) {
            yield dsiAccess(i);
          }
        }
      } else {
        foreach i in dom.these(tag, tasksPerLocale,
                               ignoreRunning, minIndicesPerTask)
         with (ref this) {
          yield dsiAccess(i);
        }
      }

I ran this through testing to see what the impact on existing codes was, imagining that we'd have very few tests that run forall loops over arrays of locales. What I failed to anticipate is that lines like:

myTargetLocales = targetLocales;

within the implementations of distributions therefore become coforall + on loops rather than the simpler array copies we're accustomed to. While this doesn't make the transformation incorrect, it may make it heavier weight than we intend. So then the question becomes "Would we want to specialize default array copies for arrays with locale eltType to avoid forall loops?" and "How much work would that be?"

Note that this observation is not particularly specific to the implementation approach taken here—if we had successfully created a special domain map for the Locales array and and arrays formed from it, we'd presumably, at least potentially, run into similar issues in these more implicit form of invoking forall loops.

bradcray commented 4 weeks ago

More for my own future self's sake than anyone else's, here's the branch that I started experimenting with this idea on: https://github.com/bradcray/chapel/tree/forall-locale-array-parallel

And here's one where I added a bunch of debugging output to determine that an implicit forall over targetLocales is happening within initCopyAfterTransfer(): https://github.com/bradcray/chapel/tree/forall-locale-array-parallel-with-debug-prints

mppf commented 3 weeks ago

Wouldn't this be a breaking language change?

bradcray commented 3 weeks ago

I meant to address that question in the OP: I think it could be viewed as more of a performance/implementation question than a breaking change. But it obviously has pretty dramatic impact on how such loops behave. And if one were to use a local block today to assert that all of their foralls over arrays of locales were executing on locale 0, or were to print out here.id within such a loop, the behavior would obviously change. I think it's reasonable to ask whether that's behavior we've guaranteed, though. Specifically, I don't think that we've ever very precisely specified the definition of how forall loops over arrays declared with no explicit distribution behave. And I can imagine other optimizations we might do in the future (e.g., more aggressive localizing of data) that might be considered similarly breaking.

Anyway, in writing that, I don't mean to suggest that it's obviously a non-issue—it's not at all obvious to me. But it's also not obviously something I feel like we've committed to. Even if we were to decide "yes, it's definitely a breaking change," it may be the sort of change we'd want to run past a language review board as being a step in a better direction, and seems like something that people may be unlikely to be relying on the current behavior for in any deep way (at least that I've been able to come up with so far).

[edit: Noting that if we were to take the original approach of giving the Locales array its own custom distribution, that seems like it would be a similarly "is this really breaking or not?" type of change].

mppf commented 3 weeks ago

I'd be inclined to view this as a breaking change but one that's unlikely to cause much trouble for people in practice. If we end up following the lead of Rust Editions, where you can opt into newer language versions somehow, I think this would be a reasonable thing to attach to it.

(That's all assuming that we conclude that we do want this change, which I haven't considered much myself yet. I lean towards doing something along these lines although I'm worried that specializing on the locale element type will cause strange behavior changes for generic code that happens to be working with an array of locale).