chapel-lang / chapel

a Productive Parallel Programming Language
https://chapel-lang.org
Other
1.77k stars 416 forks source link

`ChapelLocale` module: Clarify `locale.id` behavior/documentation w.r.t. sub-locales #20217

Closed jeremiah-corrado closed 1 year ago

jeremiah-corrado commented 2 years ago

The documentation for locale.id does not clearly describe how the method behaves on sub-locales (such as GPUs).

It is also not totally decided what the behavior should be; however, in a recent module review discussion, most of the team were leaning towards the id method always deferring to the the parent locale when applicable.

i.e. the following should print "true":

writeln(here.gpus[0].id == here.id)

An alternative suggestion was to support two methods:

The primary objections being:

e-kayrakli commented 2 years ago

I am for keeping id behave the way it does today, because it feels too hard to change in terms of muscle memory.

Related issue: https://github.com/Cray/chapel-private/issues/2755

stonea commented 2 years ago

I'd advocate for changing the name of .id to something more specific like .localeId.

jeremiah-corrado commented 2 years ago

I agree that it would be nice if the name of the method more clearly indicated that it queries the top-level locale's ID; however, I'm not sure about .localeId specifically.

To me, it seems a bit redundant to include the type name in the name of the method. I.e. the header/docs will look like this:

proc locale.localeId: int

which I think might be somewhat confusing to users.

Not sure what a better alternative is though...

mppf commented 2 years ago

topLevelId ? indexIntoLocales ? index ? Something like those?

jeremiah-corrado commented 2 years ago

This naming discussion is making me reconsider the idea of having two id-related methods on the locale type. One to query the index in the top-level Locales array. And another to query the index in the "nearest" locales array.

For example, the behavior might look something like this (I'm not attached to these names specifically):

on Locales[1] {
    writeln(here.topLevelId); // 1
    writeln(here.id); // 1

    on here.gpus[2] {
        writeln(here.topLevelId); // 1
        writeln(here.id); // 2
    }
}

This is sort of analogous to the behavior of locale.name and locale.hostname (where hostname always gives the name of the hardware associated with the top-level locale, and name gives more specific sub-locale information when called on a sub-locale).

This approach would allow us to keep id as the name of one of the two methods to reduce churn for users (especially for those that aren't using sub-locales and would thus be unaware of the potential change in behavior for here.gpus[0].id).


One of the initial objections to this approach from the module-review meeting was that the index of the sub-locale is not very useful information; however, I'm not sure this is true for multi-gpu systems. For example the following code uses the gpu's id to compute the relevant sub-section of an array:

var A: [0..<N] int;
const n = N / here.gpus.size;

coforall gpu in here.gpus do on gpu {
    const my_section = n*here.id..#n;

    var my_A = A[my_section];
    my_A += 1;
    A[my_section] = my_A;
}

There was also the objection that this could be confusing/problematic if we ever end up with a triply nested locale structure (i.e. something like: base-locale -> numa-regions -> numa-specific-gpus). However, I think the two method approach would still work pretty intuitively as follows:

on Locales[1] {
    writeln(here.topLevelId); // 1
    writeln(here.id); // 1

    on here.numaRegions[2] {
        writeln(here.topLevelId); // 1
        writeln(here.id); // 2

        on here.gpus[3] {
            writeln(here.topLevelId); // 1
            writeln(here.id); // 3
        }
    }
}

Where the numa index is somewhat difficult to query from the gpu "layer" (but it could be brought in from above as a separate variable if needed).

bradcray commented 2 years ago

I like keeping the current behavior and definition of locale.id for top-level locales for the reasons Engin and Jeremiah give in comments above. I'm not certain what we should do about gpu sub-locales at present.

One idea that has long appealed to me, which I think relates somewhat to Jeremiah's most recent comment, has been to have some way of establishing a "current Locales view" of sorts such that a given piece of code could be written to refer to the current locales whether that's the global set of top-level locales or some other view of locales that the user has set up. Without bringing GPUs into it, imagine writing an ocean simulation model that does things like distribute arrays across locales. Now imagine wanting to run that model not in a standalone mode that owns all the locales, but a coupled climate model in which maybe its given a fraction of the system's locales and other models (air, ice, etc.) are given other disjoint fractions. If the original code was written in terms of the built-in Locales array, it would need to be changed to be used in this coupled mode, for example, by parameterizing everything by a user-supplied Locales array. But is there something the language could or should be doing to make this pattern easier to write? And if we had it, would it shed any light on how we might "zoom in" on sub-locale structures like the GPU array or a numa array (say?).

(Note that I think we should try to avoid precluding the ability to write 3-deep locale models, but I also don't believe that they'll show up in the near-term because I think it's far more likely that we'll start running multiple processes per node to deal with NUMA due to the trend of putting multiple NICs per node).

stonea commented 2 years ago

I'd caution against using a name as generic as id to refer to sublocale index. If we had numa domains and GPUs and you did x = Locales[0].gpus[2]; y = Locales[1].numas[2] would x.id == y.id? If so I'd find that behavior confusing (my default assumption is any two objects with the same id should be the same object).

If it were up to me I'd rename id to something like localeId or topLevelId and defer making any decisions about how to get a GPU or NUMA index or the like until we have a compelling use case.

Edit: I'd also find it confusing if x = Locales[0].gpus[2]; y = Locales[1].gpus[2] had x.id == y.id

bradcray commented 2 years ago

FWIW, I think that checking whether two locales are the same should be done using locale1 == locale2 rather than locale1.id == locale2.id. That said, I also understand and agree that it'd be confusing for those two expressions to give different results.

If it were up to me I'd rename id to something like localeId or topLevelId and defer making any decisions about how to get a GPU or NUMA index or the like until we have a compelling use case.

To me, here.localeID feels a bit like requiring people to write list.listSize rather than list.size.

Meanwhile, requiring here.topLevelID feels like it unduly penalizes current code that's longstanding and working well in favor of a feature we haven't defined and don't really understand very well yet (or at least that I don't understand well. I.e., "what should ".id" mean for GPUs?" and "How do we anticipate using .id on GPUs constructively?"). Essentially, I'd like to understand the plan for GPU (or other) sublocale IDs better before throwing current code and programmers under the bus.

It seems like we wouldn't get into trouble w.r.t. 2.0 and stability (assuming GPUs are not part of 2.0) if we were to:

Alternatively, we could:

Since I've said I'm not clear how I'd use .id for a GPU, I'll mention how it's most useful to me for locales:

I'll also note that in various discussions about distributions, we've sometimes winced at disparities between here.id and what we really want to know which is "What is my index in this distribution's targetLocales array?" I believe that @vasslitvinov , @e-kayrakli , and @ronawho have done most of the recent wrestling with that topic and question. This also relates slightly to my previous comment about notions of scoped/active Locales views.

jeremiah-corrado commented 2 years ago

I.e., "what should ".id" mean for GPUs?" and "How do we anticipate using .id on GPUs constructively?")

I'm curious what people on the GPU team think of this example? Is this a reasonable use of id (where here.id is returning the index of the gpu in the gpus array):

var A: [0..<N] int;
const n = N / here.gpus.size;

coforall gpu in here.gpus do on gpu {
    const my_section = n*here.id..#n;

    var my_A = A[my_section];
    my_A += 1;
    A[my_section] = my_A;
}

Or is it more typical to iterate over a numerical range like: for gpu_idx in 0..#here.gpus.size when spreading work across multiple gpus?

vasslitvinov commented 2 years ago

we've sometimes winced at disparities between here.id and what we really want to know which is "What is my index in this distribution's targetLocales array?"

We started discussing this on #17915. See also other issues referenced from there, like #14405 and #16405.

jeremiah-corrado commented 2 years ago

Based on the discussion in this thread (and more recently this one), it sounds like most people are opposed to changing the name or behavior of the current id method, at least within the context of Chapel 2.0 and the flat locale model.

To summarize some important points from above:

Barring any concerns or opposing opinions, I am inclined to close this issue and consider the id method stabilized for 2.0.

bradcray commented 2 years ago

it sounds like most people are opposed to changing the name or behavior of the current id method, at least within the context of Chapel 2.0 and the flat locale model.

This definitely characterizes where I started out and have been throughout most of this discussion. However, Andy's question on https://github.com/Cray/chapel-private/issues/2755#issuecomment-1212235411 gave me pause because while I've been asserting that it'd be ideal if id continued to return an int matching the top-level locale's ID in the Locales array, I'm increasingly uncertain how often we use this as an integer value in code (vs. printing it to see an integer value). To that end, I suggested doing an experiment to change id to a record (simply containing an int field for starters) and seeing what the level of effort would be to get the hello programs working; and then to see how much of the rest of the test suite broke on a paratest run. I think this would be a valuable experiment before making this decision (since, if returning a record type didn't break much existing code and patterns, we could name the record type that's returned, but add additional fields to it over time, or for specific locale models). That's where I was Friday, and the weekend hasn't changed my mind on it.

jeremiah-corrado commented 2 years ago

to that end, I suggested doing an experiment to change id to a record (simply containing an int field for starters) and seeing what the level of effort would be to get the hello programs working; and then to see how much of the rest of the test suite broke on a paratest run.

Ok, I must have missed/misread that last message somehow. I'd be happy to run that experiment before the end of this sprint and report the results back here!

jeremiah-corrado commented 2 years ago

I've created an experimental branch to test Brad's idea: https://github.com/chapel-lang/chapel/compare/main...jeremiah-corrado:chapel:locale-id-type

Summary of changes:

Summary of results:

On the first round of paratesting, many tests with comparisons between two localeIds failed when one of the locales was queried from a variable. For example:

if arraySection.locale.id == here.id { }

The error message indicated that the id on the left was an int(32) rather than a localeId, and I am not sure why. I haven't been able to solve this problem directly thus far, so as a temporary solution, I added casts to int to here.id in most of these cases.


Many of the remaining failures originated in the tests themselves, where ids were used to index into arrays, or were involved in integer arithmetic.

For example: https://github.com/chapel-lang/chapel/blob/8bf656e82e34d27f34f9ccf398ccc2173ba6d0fe/test/studies/ssca2/rachels/SSCA2_kernels.chpl#L257-L259

or

https://github.com/chapel-lang/chapel/blob/8bf656e82e34d27f34f9ccf398ccc2173ba6d0fe/test/distributions/bharshbarg/blockCyclic/indexing.chpl#L13-L15

Here is a complete list of the remaining 152 paratest failures:

``` [Test Summary - 220817.124251] [Error matching compiler output for arrays/deitz/many/test_many_ranks_of_arithmetic_domains] [Error compilation failed for arrays/ferguson/array-initialization-patterns/private-record] [Error matching compiler output for arrays/slices/sliceUsesDom-locality] [Error matching compiler output for chplvis/tree] [Error matching compiler output for classes/ferguson/forwarding/isx-bug] [Error matching compiler output for distributions/bharshbarg/blockCyclic/indexing (compopts: 1)] [Error matching compiler output for distributions/bharshbarg/blockCyclic/indexing (compopts: 2)] [Error matching compiler output for distributions/bharshbarg/blockCyclic/indexing (compopts: 3)] [Error matching compiler output for distributions/bharshbarg/subQuery] [Error matching compiler output for distributions/block/emptyBbox] [Error matching compiler output for distributions/bradc/assoc/userAssoc-domain-stress] [Error matching compiler output for distributions/bradc/spsBlk/blockSparse1] [Error matching compiler output for distributions/bradc/spsBlk/blockSparse2] [Error matching compiler output for distributions/bradc/spsBlk/blockSparse2CSR] [Error matching compiler output for distributions/diten/testBlockUtilityFns] [Error matching compiler output for distributions/diten/testCyclicUtilityFns] [Error matching compiler output for distributions/private/PrivateDistNonNilable] [Error matching compiler output for distributions/replicated/scanReplicated] [Error matching compiler output for distributions/replicated/testReplWrites] [Error matching compiler output for distributions/replicated/testReplicand] [Error matching compiler output for distributions/vass/testReplicatedDist1] [Error matching compiler output for distributions/vass/testReplicatedVar1] [Error matching compiler output for distributions/vass/testReplicatedVar2] [Error matching compiler output for exercises/Mandelbrot/solutions/mandelbrot5-taskpar-dist] [Error matching compiler output for exercises/MonteCarloPi/deitz/MonteCarloPi/multiLocaleTaskParallelMonteCarloPi] [Error matching compiler output for exercises/MonteCarloPi/multiLocaleTaskParallel] [Error matching compiler output for expressions/bradc/condExprInClassInit] [Error matching compiler output for functions/deitz/iterators/leader_follower/test_recursive_leader3] [Error matching compiler output for functions/iterators/multilocale/multilocRecIter] [Error matching compiler output for functions/operatorOverloads/operatorMethods/genericsInstantiationBad] [Error matching compiler output for io/bradc/multilocale/testdistwrite] [Error matching compiler output for io/ferguson/buffer_test] [Error matching compiler output for library/packages/Buffers/copy_strings] [Error matching compiler output for library/packages/Collection/CollectionBulk (compopts: 2)] [Error matching compiler output for library/packages/Collection/CollectionCounter (compopts: 2)] [Error matching compiler output for library/packages/Collection/CollectionNQueens (compopts: 2)] [Error matching compiler output for library/packages/Collection/DistBagWriteThis] [Error compilation failed for library/packages/DistributedIters/checkDistributedIters] [Error matching compiler output for library/packages/EpochManager/atomicObjects/atomicObjectsTest] [Error matching compiler output for library/packages/LinearAlgebra/correctness/no-dependencies/correctness] [Error matching compiler output for library/packages/Sort/RadixSort/distributedRadixSort] [Error matching compiler output for library/packages/Sort/correctness/two-array-radix-sort-bug] [Error matching compiler output for library/packages/Sort/performance/dist-performance] [Error matching compiler output for library/packages/canCompileNoLink/MPItest] [Error matching compiler output for localeModels/nodeIDbug] [Error matching compiler output for modules/vass/SSCA2-with-Private-1] [Error matching compiler output for modules/vass/SSCA2-with-Private-2] [Error matching compiler output for multilocale/bradc/globalConstRepl] [Error matching compiler output for multilocale/deitz/linked/test_linked3b] [Error matching compiler output for multilocale/deitz/linked/test_linked3c] [Error matching compiler output for multilocale/diten/oneOrMoreLocales/1DBlockDistributedJacobi] [Error matching compiler output for multilocale/diten/oneOrMoreLocales/assocArrayOfRemoteArithArray] [Error matching compiler output for multilocale/vass/recursive-iterator-twice-multiloc] [Error matching compiler output for npb/ft/npadmana/ft_transposed_comp_only] [Error matching compiler output for npb/is/mcahir/intsort.mtml] [Error matching compiler output for optimizations/bulkcomm/alberto/Block/3dAgTestStride] [Error matching compiler output for optimizations/bulkcomm/alberto/Block/3dStrideBlock] [Error matching compiler output for optimizations/bulkcomm/alberto/Cyclic/1dStrideCyclic] [Error matching compiler output for optimizations/bulkcomm/alberto/Cyclic/2dStrideCyclic] [Error matching compiler output for optimizations/bulkcomm/alberto/Cyclic/3dAgTestStride] [Error matching compiler output for optimizations/bulkcomm/alberto/Cyclic/3dStrideCyclic] [Error matching compiler output for optimizations/parallelAssign/basicDR] [Error matching compiler output for optimizations/rafa/2dAgTest] [Error matching compiler output for optimizations/remoteValueForwarding/bradc/testconstlocality] [Error matching compiler output for optimizations/sungeun/RADOpt/access1d] [Error matching compiler output for optimizations/sungeun/RADOpt/access2d] [Error matching compiler output for optimizations/sungeun/RADOpt/access3d] [Error matching compiler output for parallel/coforall/bradc/nested/nestedForall] [Error matching compiler output for parallel/coforall/bradc/nested/nestedForall1a] [Error matching compiler output for parallel/coforall/bradc/nested/nestedForall3] [Error matching compiler output for parallel/forall/task-private-vars/block] [Error matching compiler output for parallel/forall/task-private-vars/local-slice] [Error matching compiler output for parallel/forall/vass/par-loops/2-enclosing-par-loops-error-1] [Error matching compiler output for parallel/forall/vass/par-loops/2-enclosing-par-loops-error-2] [Error matching compiler output for parallel/forall/vass/par-loops/2-enclosing-par-loops-error-3] [Error matching compiler output for parallel/forall/vass/par-loops/2-enclosing-par-loops-error-4] [Error matching compiler output for parallel/taskCount/runningTaskCount/innerCoforall] [Error matching compiler output for parallel/taskCount/runningTaskCount/innerOuterCoforall] [Error matching compiler output for parallel/taskCount/runningTaskCount/outerCoforall] [Error matching compiler output for parallel/taskCount/runningTaskCount/serial] [Error matching compiler output for parallel/taskPar/elliot/barrier/all-locales] [Error matching compiler output for parallel/taskPar/sungeun/barrier/commDiags] [Error matching compiler output for parallel/taskPar/sungeun/private] [Error matching compiler output for parallel/taskPar/taskIntents/ri-coforall+on] [Error matching compiler output for patterns/eureka/eureka] [Error matching compiler output for performance/comm/barrier/empty-chpl-barrier (compopts: 1)] [Error matching compiler output for performance/comm/barrier/empty-chpl-barrier (compopts: 2)] [Error matching compiler output for performance/comm/barrier/empty-chpl-barrier (compopts: 3)] [Error matching compiler output for performance/sungeun/assign_across_locales] [Error matching compiler output for performance/sungeun/copy_to_local] [Error matching compiler output for release/examples/benchmarks/hpcc/stream-ep] [Error compilation failed for release/examples/benchmarks/isx/isx] [Error matching compiler output for release/examples/primers/chplvis/chplvis2] [Error matching compiler output for release/examples/primers/chplvis/chplvis4] [Error matching compiler output for release/examples/primers/distributions] [Error matching compiler output for release/examples/primers/replicated] [Error matching compiler output for scan/scanPerf] [Error matching compiler output for scan/testScanOps] [Error matching compiler output for sparse/slices/sparseSliceDenseBlock] [Error: Timed out executing program studies/adventOfCode/2021/bradc/day23a] [Error matching compiler output for studies/bale/histogram/histo-atomics] [Error matching compiler output for studies/bale/toposort/toposort] [Error matching compiler output for studies/comd/elegant/arrayOfStructs/CoMD] [Error matching compiler output for studies/dedup/dedup-spawn-sort] [Error matching compiler output for studies/graph500/kristyn/Graph500_1D_onV/main] [Error matching compiler output for studies/graph500/v3/main] [Error matching compiler output for studies/hpcc/RA/diten/ra] [Error matching compiler output for studies/hpcc/STREAMS/bradc/stream-fragmented-timecoforall] [Error matching compiler output for studies/hpcc/STREAMS/bradc/stream-fragmented] [Error matching compiler output for studies/hpcc/STREAMS/diten/stream-fragmented-local] [Error compilation failed for studies/isx/isx-bucket-spmd (compopts: 1)] [Error compilation failed for studies/isx/isx-hand-optimized (compopts: 1)] [Error compilation failed for studies/isx/isx-no-return (compopts: 1)] [Error compilation failed for studies/isx/isx-per-task] [Error matching compiler output for studies/lammps/shemmy/m-lammps] [Error matching compiler output for studies/lsms/shemmy/m-lsms] [Error matching compiler output for studies/madness/aniruddha/madchap/test_diff] [Error matching compiler output for studies/madness/aniruddha/madchap/test_gaxpy] [Error matching compiler output for studies/madness/aniruddha/madchap/test_likepy] [Error matching compiler output for studies/madness/aniruddha/madchap/test_mul] [Error matching compiler output for studies/madness/aniruddha/madchap/test_showboxes] [Error matching compiler output for studies/prk/DGEMM/SUMMA/dgemm (compopts: 1)] [Error matching compiler output for studies/prk/DGEMM/SUMMA/dgemm (compopts: 2)] [Error matching compiler output for studies/prk/DGEMM/dgemm] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 1)] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 2)] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 3)] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 4)] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 5)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 1)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 2)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 3)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 4)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 5)] [Error matching compiler output for studies/ssca2/test-rmatalt/nondet] [Error matching compiler output for studies/stencil9/stencil9-explicit] [Error matching compiler output for types/chplhashtable/recordOfNonHashRecord] [Error matching compiler output for types/collections/arrIndicesAreLoc] [Error matching compiler output for types/locale/bradc/changeLocaleID] [Error matching compiler output for types/tuple/bradc/commTuple] [Error matching compiler output for types/typedefs/bradc/arrayTypedef] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseAdd1D] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseAdd2D (compopts: 1)] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseAdd2D (compopts: 2)] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseBulk1D] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseBulk2D (compopts: 1)] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseBulk2D (compopts: 2)] [Error matching compiler output for users/ferguson/issue-19128-1] [Error matching compiler output for visibility/except/operatorsExceptions/exceptGreaterThan] [Error matching compiler output for visibility/except/operatorsExceptions/exceptGreaterThanOrEqual] [Error matching compiler output for visibility/except/operatorsExceptions/exceptLessThan] [Error matching compiler output for visibility/except/operatorsExceptions/exceptLessThanOrEqual] [Summary: #Successes = 13417 | #Failures = 152 | #Futures = 0] [END] ```

The total number of failures is encouraging; however, it is slightly worrying how many of them fell under studies, as this indicates to me that users may use locale.id as an integer pretty frequently.

jeremiah-corrado commented 2 years ago

Latest results:

I made a few more fixes in the experimental branch. It now has 143 failing test.

Full list of failures:

``` [Test Summary - 220823.093406] [Error matching compiler output for arrays/deitz/many/test_many_ranks_of_arithmetic_domains] [Error compilation failed for arrays/ferguson/array-initialization-patterns/private-record] [Error matching compiler output for arrays/slices/sliceUsesDom-locality] [Error matching compiler output for chplvis/tree] [Error matching compiler output for classes/ferguson/forwarding/isx-bug] [Error matching compiler output for distributions/bharshbarg/blockCyclic/indexing (compopts: 1)] [Error matching compiler output for distributions/bharshbarg/blockCyclic/indexing (compopts: 2)] [Error matching compiler output for distributions/bharshbarg/blockCyclic/indexing (compopts: 3)] [Error matching compiler output for distributions/bharshbarg/subQuery] [Error matching compiler output for distributions/block/emptyBbox] [Error matching compiler output for distributions/bradc/assoc/userAssoc-domain-stress] [Error matching compiler output for distributions/bradc/spsBlk/blockSparse1] [Error matching compiler output for distributions/bradc/spsBlk/blockSparse2] [Error matching compiler output for distributions/bradc/spsBlk/blockSparse2CSR] [Error matching compiler output for distributions/diten/testBlockUtilityFns] [Error matching compiler output for distributions/diten/testCyclicUtilityFns] [Error matching compiler output for distributions/replicated/scanReplicated] [Error matching compiler output for distributions/replicated/testReplWrites] [Error matching compiler output for distributions/replicated/testReplicand] [Error matching compiler output for distributions/vass/testReplicatedDist1] [Error matching compiler output for distributions/vass/testReplicatedVar1] [Error matching compiler output for distributions/vass/testReplicatedVar2] [Error matching compiler output for exercises/Mandelbrot/solutions/mandelbrot5-taskpar-dist] [Error matching compiler output for exercises/MonteCarloPi/deitz/MonteCarloPi/multiLocaleTaskParallelMonteCarloPi] [Error matching compiler output for exercises/MonteCarloPi/multiLocaleTaskParallel] [Error matching compiler output for expressions/bradc/condExprInClassInit] [Error matching compiler output for functions/deitz/iterators/leader_follower/test_recursive_leader3] [Error matching compiler output for functions/iterators/multilocale/multilocRecIter] [Error matching compiler output for functions/operatorOverloads/operatorMethods/genericsInstantiationBad] [Error matching compiler output for io/bradc/multilocale/testdistwrite] [Error matching compiler output for io/ferguson/buffer_test] [Error matching compiler output for library/packages/Buffers/copy_strings] [Error matching compiler output for library/packages/Collection/CollectionBulk (compopts: 2)] [Error matching compiler output for library/packages/Collection/CollectionCounter (compopts: 2)] [Error matching compiler output for library/packages/Collection/CollectionNQueens (compopts: 2)] [Error matching compiler output for library/packages/Collection/DistBagWriteThis] [Error compilation failed for library/packages/DistributedIters/checkDistributedIters] [Error matching compiler output for library/packages/Sort/RadixSort/distributedRadixSort] [Error matching compiler output for library/packages/Sort/correctness/two-array-radix-sort-bug] [Error matching compiler output for library/packages/Sort/performance/dist-performance] [Error matching compiler output for localeModels/nodeIDbug] [Error matching compiler output for multilocale/bradc/globalConstRepl] [Error matching compiler output for multilocale/deitz/linked/test_linked3b] [Error matching compiler output for multilocale/deitz/linked/test_linked3c] [Error matching compiler output for multilocale/diten/oneOrMoreLocales/1DBlockDistributedJacobi] [Error matching compiler output for multilocale/diten/oneOrMoreLocales/assocArrayOfRemoteArithArray] [Error matching compiler output for multilocale/vass/recursive-iterator-twice-multiloc] [Error matching compiler output for npb/ft/npadmana/ft_transposed_comp_only] [Error matching compiler output for npb/is/mcahir/intsort.mtml] [Error matching compiler output for optimizations/bulkcomm/alberto/Block/3dAgTestStride] [Error matching compiler output for optimizations/bulkcomm/alberto/Block/3dStrideBlock] [Error matching compiler output for optimizations/bulkcomm/alberto/Cyclic/1dStrideCyclic] [Error matching compiler output for optimizations/bulkcomm/alberto/Cyclic/2dStrideCyclic] [Error matching compiler output for optimizations/bulkcomm/alberto/Cyclic/3dAgTestStride] [Error matching compiler output for optimizations/bulkcomm/alberto/Cyclic/3dStrideCyclic] [Error matching compiler output for optimizations/parallelAssign/basicDR] [Error matching compiler output for optimizations/rafa/2dAgTest] [Error matching compiler output for optimizations/remoteValueForwarding/bradc/testconstlocality] [Error matching compiler output for optimizations/sungeun/RADOpt/access1d] [Error matching compiler output for optimizations/sungeun/RADOpt/access2d] [Error matching compiler output for optimizations/sungeun/RADOpt/access3d] [Error matching compiler output for parallel/coforall/bradc/nested/nestedForall] [Error matching compiler output for parallel/coforall/bradc/nested/nestedForall1a] [Error matching compiler output for parallel/coforall/bradc/nested/nestedForall3] [Error matching compiler output for parallel/forall/task-private-vars/block] [Error matching compiler output for parallel/forall/task-private-vars/local-slice] [Error matching compiler output for parallel/forall/vass/par-loops/2-enclosing-par-loops-error-1] [Error matching compiler output for parallel/forall/vass/par-loops/2-enclosing-par-loops-error-2] [Error matching compiler output for parallel/forall/vass/par-loops/2-enclosing-par-loops-error-3] [Error matching compiler output for parallel/forall/vass/par-loops/2-enclosing-par-loops-error-4] [Error matching compiler output for parallel/taskCount/runningTaskCount/innerCoforall] [Error matching compiler output for parallel/taskCount/runningTaskCount/innerOuterCoforall] [Error matching compiler output for parallel/taskCount/runningTaskCount/outerCoforall] [Error matching compiler output for parallel/taskCount/runningTaskCount/serial] [Error matching compiler output for parallel/taskPar/sungeun/private] [Error matching compiler output for parallel/taskPar/taskIntents/ri-coforall+on] [Error matching compiler output for patterns/eureka/eureka] [Error matching compiler output for performance/sungeun/assign_across_locales] [Error matching compiler output for performance/sungeun/copy_to_local] [Error matching compiler output for release/examples/benchmarks/hpcc/stream-ep] [Error compilation failed for release/examples/benchmarks/isx/isx] [Error matching compiler output for release/examples/primers/chplvis/chplvis2] [Error matching compiler output for release/examples/primers/chplvis/chplvis4] [Error matching compiler output for release/examples/primers/distributions] [Error matching compiler output for release/examples/primers/replicated] [Error matching compiler output for scan/scanPerf] [Error matching compiler output for scan/testScanOps] [Error matching compiler output for sparse/slices/sparseSliceDenseBlock] [Error matching compiler output for studies/bale/histogram/histo-atomics] [Error matching compiler output for studies/bale/toposort/toposort] [Error matching compiler output for studies/comd/elegant/arrayOfStructs/CoMD] [Error matching compiler output for studies/dedup/dedup-spawn-sort] [Error matching compiler output for studies/graph500/kristyn/Graph500_1D_onV/main] [Error matching compiler output for studies/graph500/v3/main] [Error matching compiler output for studies/hpcc/RA/diten/ra] [Error matching compiler output for studies/hpcc/STREAMS/bradc/stream-fragmented-timecoforall] [Error matching compiler output for studies/hpcc/STREAMS/bradc/stream-fragmented] [Error matching compiler output for studies/hpcc/STREAMS/diten/stream-fragmented-local] [Error compilation failed for studies/isx/isx-bucket-spmd (compopts: 1)] [Error compilation failed for studies/isx/isx-hand-optimized (compopts: 1)] [Error compilation failed for studies/isx/isx-no-return (compopts: 1)] [Error compilation failed for studies/isx/isx-per-task] [Error matching compiler output for studies/lammps/shemmy/m-lammps] [Error matching compiler output for studies/lsms/shemmy/m-lsms] [Error matching compiler output for studies/madness/aniruddha/madchap/mytests/par-compress/test_compress] [Error matching compiler output for studies/madness/aniruddha/madchap/mytests/par-reconstruct/test_reconstruct] [Error matching compiler output for studies/madness/aniruddha/madchap/mytests/par-refine/test_refine] [Error matching compiler output for studies/madness/aniruddha/madchap/test_diff] [Error matching compiler output for studies/madness/aniruddha/madchap/test_gaxpy] [Error matching compiler output for studies/madness/aniruddha/madchap/test_likepy] [Error matching compiler output for studies/madness/aniruddha/madchap/test_mul] [Error matching compiler output for studies/madness/aniruddha/madchap/test_showboxes] [Error matching compiler output for studies/prk/DGEMM/SUMMA/dgemm (compopts: 1)] [Error matching compiler output for studies/prk/DGEMM/SUMMA/dgemm (compopts: 2)] [Error matching compiler output for studies/prk/DGEMM/dgemm] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 1)] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 2)] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 3)] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 4)] [Error matching compiler output for studies/ssca2/main/SSCA2_main (compopts: 5)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 1)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 2)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 3)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 4)] [Error matching compiler output for studies/ssca2/rachels/SSCA2_test (compopts: 5)] [Error matching compiler output for studies/ssca2/test-rmatalt/nondet] [Error matching compiler output for studies/stencil9/stencil9-explicit] [Error matching compiler output for types/chplhashtable/recordOfNonHashRecord] [Error matching compiler output for types/collections/arrIndicesAreLoc] [Error matching compiler output for types/locale/bradc/changeLocaleID] [Error matching compiler output for types/tuple/bradc/commTuple] [Error matching compiler output for types/typedefs/bradc/arrayTypedef] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseAdd1D] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseAdd2D (compopts: 1)] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseAdd2D (compopts: 2)] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseBulk1D] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseBulk2D (compopts: 1)] [Error matching compiler output for users/engin/sparse_bulk_dist/sparseBulk2D (compopts: 2)] [Error matching compiler output for users/ferguson/issue-19128-1] [Error matching compiler output for visibility/except/operatorsExceptions/exceptGreaterThan] [Error matching compiler output for visibility/except/operatorsExceptions/exceptGreaterThanOrEqual] [Error matching compiler output for visibility/except/operatorsExceptions/exceptLessThan] [Error matching compiler output for visibility/except/operatorsExceptions/exceptLessThanOrEqual] [Summary: #Successes = 13729 | #Failures = 143 | #Futures = 0] [END] ```

Here is a more detailed summary of the changes made to the locale type:

The following was added to ChapelLocale.chpl:

record localeId {
  var top_level_idx: int;
  // var gpu_idx: int; (would be added eventually)
}

inline operator :(lid: localeId, type t:int) {
  return lid.top_level_idx;
}

inline operator ==(lhs: localeId, rhs: localeId): bool {
  return lhs.top_level_idx == rhs.top_level_idx;
}

inline operator !=(lhs: localeId, rhs: localeId): bool {
  return !(lhs == rhs);
}

The return type of id was changed:

inline proc locale.id: localeId {
  // return this._value._id;
  return new localeId(this._value._id);
}

The following was added to LocaleModelHelpSetup.chpl:

proc localeId.writeThis(ch: channel) throws {
  ch.write(this.top_level_idx);
}

proc localeId.hash() {
  return this.top_level_idx.hash();
}
jeremiah-corrado commented 2 years ago

Summary of yesterday's design re-review discussion:

Overview of potential issues with locale.id as it is currently implemented:

With more gpu features coming online, and other potential sub-locale types that might be added in the future, a lot of consideration has gone into how id's current design might need to change.

Specifically there is worry about various bugs that could be created when users don't realize that (a) id equality no longer implies locale equality, and (b) locale ids might not map strictly to top-level indices in the Locales array (depending on the sub-locale implementation).

Here are some examples of this sort of confusion:

on Locales[2] {
    var otherLocale = if someParam then here else here.gpus[2];
    ...
    if here.id == otherLocale.id {
         // programmer might not expect to end up here when someParam=false, but they will
    }
}
proc isMainLocale(loc: locale): bool {
    return loc.id == 0;
}

On Locales[2] {
  if isMainLocale(here.gpus[0]) {
      // will I end up here? (I'm on Locale 2's main gpu, but Locale 2 isn't the main locale...)
  }
}

Possible solutions:

There was a lot of discussion centered around the proposal summarized in the previous two messages, i.e. changing locale.id's return type from an int to a record. This did not gain a lot of traction, as it is quite common to use a locale's id as an array index, so this proposal would require casting the record type to an integer in all of these cases. This would be a burden on current users who would need to update their code, and in general, as it is less succinct than here.id.

We also discussed the option of renaming locale.id to something like locale.topLevelId. This makes it very clear that the user will get back the index of the locale in the Locales array, even when calling from a sub-locale. It would likely remove any chance of confusion in the examples given above.

Lastly, we talked about the possibility of simply leaving the id method alone. This has the obvious benefit of avoiding churn for users, and allowing experienced chapel programmers to keep their muscle memory associated with this method. Along with this proposal, it was noted that other methods could be added to the locales type in the future, which might implicitly remove some ambiguity. I.e. if there were also a gpuId method, then it would be implied (or at least make more sense) that the id method returns the topLevelId.

Here are the votes for each proposal:

bradcray commented 2 years ago

I think after the record exploration and yesterday's discussion, I'm more strongly in favor of keeping .localeID as-is than I was before.

That said, here were some thoughts from yesterday's meeting that stuck with me:

jeremiah-corrado commented 2 years ago

I'm also in favor of leaving id as is at this point.

The argument for changing it to topLevelId does make sense; however, with good documentation, I think we can help users avoid the sort of confusion described in my previous message. This could be as simple as having an example of locale comparison somewhere in the docs (either the language spec page, or in the primer), along the following lines:


Locales can be checked for equality using a direct comparison:

proc sameLocale(locA: locale, locB: locale) {
    return locA == locB;
}

Using a locale's id for comparison is not recommended.


And then link to that section of the documentation in the docs for id, and change the wording there from "unique integer identifier" to "index in the top-level Locales array"

Additionally, with Brad's suggestion:

I might be inclined to support a different query like .uid or .uniqueID to get a richer/more complete/record-based(?) description of the locale's ID that is unique per locale/sublocale once we need that.

It would become even more clear, that id is more of a convenience function (for indexing into arrays, printing out the locales index, etc.), and that locale comparisons should be carried out either by direct comparison or by using the uniqueId method. (Not suggesting that such a method needs to be implemented right away, just agreeing that it would be a big improvement for the clarity/completeness of the api).

bradcray commented 2 years ago

Using a locale's id for comparison is not recommended.

I don't know if I'd go quite that far, because I think it's still a valid check and thing to write, it's just important to understand that it will only check that two locales are either the same, or part of the same, top-level locale—not that they are the identical locale/sublocale object. Of course, we also only really need to make that distinction once the GPU sublocales are officially supported.

So, rather than steering people away from this idiom, I'd just steer them towards direct locale comparison and make sure it's clear what locale.id comparisons mean and what the implications are.

jeremiah-corrado commented 2 years ago

Ok that makes sense. I can see how that would be a confusing caveat, especially without the gpu-locale stuff being publicly documented yet.

I think you and I are essentially in agreement here @bradcray. I'd like to hear from some of the people that voted for the other proposals before moving any further.

stonea commented 2 years ago

The argument for changing it to topLevelId does make sense; however, with good documentation, I think we can help users avoid the sort of confusion described in my previous message

Having good documentation is, of course, a good thing for its own sake. I'd shy away from relying too heavily on it though. Developers often don't look at docs until they encounter a problem. A more diligent developer might go look into the doc if something "tips them off" that maybe they should. With a name like topLevelId and I think, to a lesser extent, with idx that might be enough for me to take a pause and ask "what does top level mean" or "index into what?" and maybe, maybe, I'd look into the doc after that.

With id though I'm pretty sure I'd go headstrong into thinking "oh yeah, I've got this" without taking a step back to ask what, precisely, is being identified --- it's totally reasonable to assume something named id is going to uniquely identify an object.

I suppose to add one little bit of nuance you could say that assumption is only really reasonable for things of the same type or class. After all, if you handed me some persons student id and some different person's social security card (after I questioned what the heck you're doing with these cards) it wouldn't bother me if the numbers printed on each were the same (despite referring to different people).

But this isn't what's happening with top level locales and gpu locales, right? Locales are locales. They're the same datatype in the language, (the vision) is that they'll be used interchangeably in places like on statements and targetLocales=, shouldn't their ids uniquely identify them?

bradcray commented 2 years ago

With id though I'm pretty sure I'd go headstrong into thinking "oh yeah, I've got this" without taking a step back to ask what, precisely, is being identified --- it's totally reasonable to assume something named id is going to uniquely identify an object.

If the name were .uid, I'd agree with you more. For .id, I'm not sure I do, particularly given your "different type or class" caveat, which leads to:

Moreover, it's not as though, by virtue of being a sublocale, a GPU sublocale is completely divorced from its parent locale—it's still a part of it, so having a query that reflects something about the parent / larger locale context seems reasonable to me. E.g., I wouldn't expect here.hostname to print something different for a GPU sublocale than its parent locale if the GPU doesn't have its own NIC. So they're different objects, but share a returned value due to their relationship. .id feels like it could be similar to that to me—granted it's different if your assumption is that each component will have its own unique ID, but I'm not sure .id is such a universal concept that that's a safe assumption without reading the docs to see what it does.

Locales are locales. They're the same datatype in the language, (the vision) is that they'll be used interchangeably in places like on statements and targetLocales=, shouldn't their ids uniquely identify them?

Well, yes and no. There are contexts in which they're interchangeable (like on x should permit any locale to be specified for x) and contexts where they're not (like assertOnGPU() or any other GPU-specific capabilities can't be applied to a top-level locale; behaviors are different for GPU vs. top-level locales, etc.). If/when we add other locale types for FAM or a Cerebras AI chip or an FPGA or ... they're likely to be distinguishable in some ways or others as well.

w.r.t. targetLocales, I continue to remain skeptical that we'll be able to mix GPU locales and top-level locales in a single Block distribution implementation arbitrarily without hurting performance s.t. we'll need to fork off a GPUBlock distribution for targeting GPU sublocales, though I'll be happy to be proven wrong on that count.

But to help me understand better. Let's say you dove into things headfirst without reading the docs, assumed it would return a unique integer per locale or sublocale, and wrote code under those assumptions. In what situations would this blow up in your face, and how would that manifest itself? I'm having trouble coming up with a case that worries me or reflects things I've done in practice with locale IDs, myself.

And similarly, if we were to complement .id with a .sublocID query and/or .uid query to ask questions that are more about sublocale-specific IDs or globally unique IDs (immediately, say), would that relax your concerns about .id?

stonea commented 2 years ago

But to help me understand better. Let's say you dove into things headfirst without reading the docs, assumed it would return a unique integer per locale or sublocale, and wrote code under those assumptions. In what situations would this blow up in your face, and how would that manifest itself? I'm having trouble coming up with a case that worries me or reflects things I've done in practice with locale IDs, myself

It's hard to come up with something off the my head that isn't too contrived. But how about this ---

Say you have some proc that you know will be called a lot and will perform some big, hairy, long running computation. Say by the nature of what you're writing you happen to know that function will always return the same result when run on the same locale. So what do you do? You add a cache, of course. You'll use a map to store your cache but you need some kind of key into it. Using the locale object as the key seems too heavweight, what you want is just some identifying int.. but hey, wait. When you type here. in your IDE it shows there's an id field on the locale datatype. Perfect! You'll just use that. At the time your program only operates on CPUs so you're not thinking of GPUs. Why would you?

Your code ends up looking like this:

if !cachedResults.hasKey(here.id) { 
  cachedResults[here.id] = performLongRunningComputation();
}
return cachedResults[here.id];

This works great. Say a couple years later you decide you want to have your program to start using GPUs; you'll spread your computation across a mix of both CPUs and GPUs. You now have a bug. A subtle one too. Maybe even a sporadic one depending on if the order things invoke the function in is deterministic or not.

stonea commented 2 years ago

And similarly, if we were to complement .id with a .sublocID query and/or .uid query to ask questions that are more about sublocale-specific IDs or globally unique IDs (immediately, say), would that relax your concerns about .id?

That's an improvement, but I'd say having .topLevelid, .sublocId, and .uid would be even better.

Again think of the example I gave above of using .id as an index into a map. A user unfamiliar with the interface of locales who's not even thinking of future proofing their code to work with GPUs is in a situation where they need a unique id.

They'll pull up a list of functions in the locale object, maybe this is from the doc, maybe it's from the IDE. Either way their eyes glaze over. If they see all three: id sublocID and uid maybe they'll stop and think. But say they only notice id, they'll grab that without a second thought and go off on their merry way and use it.

And the thing is it will work. At least at first. It won't be until one day later when they do start using GPUs that things will break.

jeremiah-corrado commented 2 years ago

I find your example very compelling @stonea.

You'll use a map to store your cache but you need some kind of key into it. Using the locale object as the key seems too heavweight

As a sort of side-node, in this particular example, maybe the user should be using the whole locale as a key. And maybe we should have the .hash() method publicly documented for locales, making note of the fact that it computes something not super heavy weight. (Or just make a note of the fact that myMap[here] is legal, since the special method naming isn't set in stone yet)

Either way, I think your broader point that users could get themselves into trouble by using id as a unique integer, still stands. And it feels like a bigger issue in light of the example you gave (in contrast to the if localeA.id == localeB.id examples I gave before)

Personally, I am still leaning against the name topLevelId replacing id. I think it is mostly a taste issue, but I really like the shortness of id in the contexts where it is often used. For example, printing out which parts of an array are owned by each locale (myArray[i, j, k].locale.id) looks pretty nice with a short name.

What if we instead replace id with idx, and also include a uniqueId method as a part of the stable interface?

In the context of the IDE-suggestion scenario you gave, if the user has access to here.idx and here.uniqueId, I think they would go with the uniqueId method, or at least be curious enough about the difference to go check out the docs.

(I'd also be ok with keeping id as is, but either way, I'd prefer to add a uniqueId method to the stable interface, and update the docs that use someArray[here.id] to use someArray[here.uniqueId] instead)

bradcray commented 2 years ago

You now have a bug. A subtle one too. ... It won't be until one day later when they do start using GPUs that things will break.

It's not clear to me what the bug is or why things will break. That the GPU is using a result that was computed on, and stored on, the CPU rather than computing the result itself and storing it in its own memory? So it seems like more of a potential performance bug than a correctness one? (though maybe it's not even a performance bug if the computation in question isn't able to execute on the GPU because it's not GPUizable?)

(I realize you indicated it wasn't completely uncontrived... just trying to understand why it will be a break / will break)

But say they only notice id, they'll grab that without a second thought and go off on their merry way and use it.

I have to say... I'm still not all that sympathetic to protecting users who stop reading (and/or don't read the docs for the thing they're calling) from themselves at the cost of adding pain to developers and users who don't (and/or do), and current users.

stonea commented 2 years ago

It's not clear to me what the bug is or why things will break. That the GPU is using a result that was computed on, and stored on, the CPU rather than computing the result itself and storing it in its own memory?

No, that's not how I'm imagining this example. Bringing it back:

proc foo() {
  if !cachedResults.hasKey(here.id) { 
    cachedResults[here.id] = performLongRunningComputation();
  }
  return cachedResults[here.id];
}

Say performLongRunningComputation() performs a long running foreach or forall loop. If will run on a GPU locale if here is a GPU and a CPU locale it here is a CPU locale. Again, assume by the nature of the problem, if we call performLongRunningComputation multiple times on the same (sub)locale it will produce the same result (so we can cache it). Maybe this function is working on some data that's been distributed across all locales (including GPU sublocales when available). I'm thinking cachedResults itself would be stored on the CPU but I'm not sure that's relevant.

The bug is this: imagine some loop iterates over all locales and calls foo(), the user is expecting performLongRunningComputation() to be run on each locale and cache the result. Say they did this on a system without any GPUs and it works fine. They then move to a system with GPUs (and their problem is now distributed across both CPUs and GPUs). They expect performLongRunningComputation() to be run on every CPU and every GPU on their system and for each to store its result in the cache. Now there's a race condition involving the locale for Node 0's CPU and GPU 0 on any node (and Node 1's CPU and GPU 1 on any node, and so on).

I have to say... I'm still not all that sympathetic to protecting users who stop reading (and/or don't read the docs for the thing they're calling) from themselves at the cost of adding pain to developers and users who don't (and/or do), and current users.

To some extent I agree but I also think there's value in naming functions so they're less likely to mislead (otherwise why worry about naming at all?)

Sure, removing all ambiguity or potential for misunderstandings would also be impossible (we can't save everyone from themselves), but of course none of this is black-and-white and ultimately it comes down to a judgement call weighed by (among other things):

Given a name you think could be misleading ---

I don't think using .topLevelId or .idx in place of .id is that painful. It's not that much harder to type and modern text editors (vim and emacs included) have autocomplete features anyway.

If .id is showing up in hundreds of places in user code I might reassess this from a "it's going to be painful for users to update their existing code" point of view, but from a "what would we do if we were designing this from scratch" point of view I don't see what's so painful about .topLevelId and I especially don't see what's so painful about .idx (c'mon it's only a one character difference)

jeremiah-corrado commented 1 year ago

Proposals for locale.id:

The following table summarizes three proposals for modifying the locale interface w.r.t to the id method, and how they each differ from the current design. This table assumes that we want to provide two kinds of ids/indices (both of which are currently covered by id for the flat locale model, but not for the gpu locale model).

In some cases, there is a bit more discussion to be had about the exact names and implementations of the methods, but this table hopefully gives a pretty clear overview of what is being proposed:

ID behaviors we want to support: Top-Level-ID:
Index of the locale in the Locales array
Unique-ID:
A globally unique ID (integer or record)
Currently implemented as: id id[^1]
Proposal 1:
  • no changes to id
  • Add a uid method
id uniqueId / uid
Proposal 2:
  • Renameid->idx
  • Add a uid method
idx / topLevelId uid / uniqueId
Proposal 3:
  • Modifyidto return a UID
  • Add a topLevelId method
topLevelId / idx id[^2]

[^1]: locale.id provides a unique ID for CHPL_LOCAL_MODEL=flat because the Locales array contains all instances of the locale type; however, this is not true for other locale models with sub-locales. For CHPL_LOCALE_MODEL=gpu, gpu sub-locales have the same locale.id as their parents, and so here.id does not return a unique value (here.id == here.gpus[i].id, for all i)

[^2]: In this case, id would have to return an integer UID (as opposed to a record type) to avoid major churn for users—unless we really wanted to change the return type of id. For proposals 1 and 2 uid could either return an integer or a record type UID.

Rationale behind changing the interface:

Generally speaking the motivation for the above proposals is that id has historically provided a unique integer, but does not do so for gpu-sub-locales in its current implementation[^1]. It has been pointed out that this could lead to confusion for users and/or introduce bugs when they migrate code from flat->gpu (see @stonea's comment above: https://github.com/chapel-lang/chapel/issues/20217#issuecomment-1229000388). The quick summary is that storing per-locale results in a map, for example, using here.id as the key would be problematic when collecting results from base-locales and gpu-locales simultaneously (because parent and child locales currently have the same id).

Pros and Cons of each Proposal:

Proposal 1:

Pros:

Cons:

Proposal 2:

Pros:

Cons:

Proposal 3:

Pros:

Cons:


Please feel free to add to the list of pros/cons above if I missed something!

mppf commented 1 year ago

This table assumes that we want to provide two kinds of ids/indices (both of which are currently covered by id for the flat locale model, but not for the gpu locale model).

I have not followed all the discussion on this issue, so forgive me if this is already covered, but I don't understand why there should be a "unique ID" available from a locale at all. My view is that a locale instance is already a unique thing that can be compared with == and != and used as a key in a map, etc. (I do not know if we can sort locale elements by default today, but it would seem to be a reasonable thing to want as well).

If we want to have a bijective / invertible mapping from locale to int, we could make such a thing, but I don't know that I would call it "ID" or "Unique ID". But mostly I am questioning whether we need it at all, in the near term. Do we have use cases for it that are not served either by the top-level ID available from locale.id or from a map or associative array with locale keys?

Thanks.

jeremiah-corrado commented 1 year ago

Thanks for the feedback/questions!

My view is that a locale instance is already a unique thing that can be compared with == and != and used as a key in a map, etc

I agree, and I brought up a similar point here: https://github.com/chapel-lang/chapel/issues/20217#issuecomment-1230391901 in response to Andy's example of a non-unique-id bug. I think regardless of what we do with id, it could be helpful to provide an example of using a locale as a key to a map somewhere in the language-spec docs (in addition to some if localeA == localeB { } examples).

... mostly I am questioning whether we need it at all, in the near term. Do we have use cases for it that are not served either by the top-level ID available from locale.id or from a map or associative array with locale keys?

I think @stonea's comment here does a good job of explaining why a user might might tend to misuse the id method as a unique ID if it is available to them. And this is the basis of the argument for renaming it to something like topLevelId or for creating an explicit uid method.

On a more theoretical level, I think it serves a somewhat similar purpose to an MPI processes rank in the sense that it would allow you to index into arrays, compute subsections of arrays to work on, etc. So if we hope to support programs with heterogeneous groups of locales doing work together, then I think it could be really useful to have globally unique ids associated with locales s.t. each base-locale and gpu-locale could – for example – compute a unique sub-section of a program-wide array that it owns.

And as a sort of side note: it could also be useful for debugging a heterogenous program, in the same way that id is currently used for debugging flat programs.

mppf commented 1 year ago

I think @stonea's comment here does a good job of explaining why a user might might tend to misuse the id method as a unique ID if it is available to them. And this is the basis of the argument for renaming it to something like topLevelId or for creating an explicit uid method.

Thanks for the link to the comment. I read that as saying "A user will misuse it because it is called id" rather than saying anything about the need for some kind of mapping between all locales and ints.

On a more theoretical level, I think it serves a somewhat similar purpose to an MPI processes rank in the sense that it would allow you to index into arrays, compute subsections of arrays to work on, etc. So if we hope to support programs with heterogeneous groups of locales doing work together, then I think it could be really useful to have globally unique ids associated with locales s.t. each base-locale and gpu-locale could – for example – compute a unique sub-section of a program-wide array that it owns.

It's not obvious to me that just having the ability to turn a locale into an int is useful for this. My mind goes to things like issue #16405, where, in the context of a forall loop or a group of forall loops in sequence, you can compute things like "the current task's number" for use as index into an array of per-task state. The part I do not understand about this is why you would want it to be associated with locales (as in here.localeUniquifiedIntoInt(), which is what I am interpreting .uid() or .uniqueId() to mean). Instead the use cases that I am aware of would be within forall loops anyway. The use cases I am aware of here have to do with things like implementing scan or doing a count exchange in a radix sort. Then, the question is not just "What array index should I use for this local?" but rather "What array index should I use for the task processing this subset of the data?". (This 2nd question is unknowable just from the locale since it depends on how the iteration space is divided into tasks. The reason that I think that providing a mechanism to compute it within a forall is sufficient is that if the user were only using their own coforall calls, they could create such a mapping and compute it themselves, and this does not seem unreasonable because they are working at a lower level).

Anyway, I don't think we should add something like .uniqueId() without a use case at hand.

And as a sort of side note: it could also be useful for debugging a heterogenous program, in the same way that id is currently used for debugging flat programs.

Here I am assuming you mean "printf debugging". What does printing locale.uniqueId() give that printing locale does not? Seems like writeln(here) / writeln(myLocale) is just fine and it should print out the relevant details.

jeremiah-corrado commented 1 year ago

It's not obvious to me that just having the ability to turn a locale into an int is useful for this. ... The reason that I think that providing a mechanism to compute it within a forall is sufficient is that if the user were only using their own coforall calls, they could create such a mapping and compute it themselves, and this does not seem unreasonable because they are working at a lower level). ... Anyway, I don't think we should add something like .uniqueId() without a use case at hand.

Ok I think this is a good point. And I had been considering the coforall case in my reasoning above, but yeah, I don't think it's too unreasonable to ask the user to keep track of "integer IDs" themselves for those sorts of cases.

So I suppose we need a Proposal 1.b and proposal 2.b where we wouldn't implement a unique-id method for the time being. (So Prop 1.b would mean leaving things unchanged).

Seems like writeln(here) / writeln(myLocale) is just fine and it should print out the relevant details.

Also a good point. I suppose here.name (which prints something slightly different) is a reasonable debugging tool for sub-locales as well.

bradcray commented 1 year ago

I'm (still) on team proposal 1b (can always add new things later if/when we have motivation for them).

stonea commented 1 year ago

If we decide to keep the name as .id on the locale datatype then that puts me in a bit of a sticky situation with regards to what the behavior of .id should be for GPU locales:

Seems like my options would be:

If it would be a huge hassle to update all our existing code and having our users update theirs then maybe violating this "substitution" principle would be a reasonable thing to do but it's not clear to me whether this is the case or not.

bradcray commented 1 year ago

Expounding on my "team 1b" comment, my current thinking for GPUs is to have it return the top-level ID. E.g., I'm imagining the documentation to say:

My main reason for increasingly feeling this way as this discussion goes on is that I think the current query is short, sweet, intuitive, useful, and used; and I fear that in changing it we'd be prioritizing potential future users who are willing to start using a method without bothering to read its documentation over current existing codes and users. To me that feels like optimizing for the wrong profile. In addition, Michael's points about best practices being to compare or hash on locale values, rather than locale IDs, completely resonates with me.

There's arguably a similar question around locale.hostname: GPUs probably don't have a hostname of their own on most systems, so should they return a new bogus hostname, generate an error, or return their parent's hostname. In that case, I'd also probably choose the "parent's hostname" option because it's the closest they have to one. I'd also be open to it generating an error, but am sensitive to that being an annoyance in various ways.

jeremiah-corrado commented 1 year ago

@stonea, I agree that both of the options in your first bullet point are potentially problematic.

Are you at all opposed to something like Proposal 3 where gpu-locales would return a unique integer ID instead of the parent's .id or their index in the local gpus array? (here we would need to come up with some numbering scheme s.t. all gpus would return a globally unique integer (starting with Locales.size + 1)).

This way the base-locale .id behavior wouldn't need to change at all, but we'd get something hopefully less problematic for sub-locale ids.

I know this proposal goes against the points that @mppf made above, where he questioned why a globally-unique-id behavior would be necessary/useful at all, but I think this might kind of save us from the type of bug that you brought up (even if it isn't the most useful behavior).

(as a side note there could also be a Proposal 3.b where we wouldn't add the topLevelId/idx method for the time being, but only change .ids behavior for gpu sub-locales).

jeremiah-corrado commented 1 year ago

My main reason for increasingly feeling this way as this discussion goes on is that I think the current query is short, sweet, intuitive, useful, and used; and I fear that in changing it we'd be prioritizing potential future users who are willing to start using a method without bothering to read its documentation over current existing codes and users.

Similar to my response to Andy's comment above, I think it's notable that Proposal 3 also doesn't change the behavior of id for base locales, but it does address the potential misuse of the method as a unique-id.

For this proposal, the analogous documentation entry might look something like:

Which is arguably somewhat less straightforward than Proposal 1 (and doesn't align with the way hostname would probably work for gpus as you pointed out). But I do think it would make the kind of bug we've been discussing easier to avoid.

stonea commented 1 year ago

Responding to Brad:

My main reason for increasingly feeling this way as this discussion goes on is that I think the current query is short, sweet, intuitive, useful, and used; and I fear that in changing it we'd be prioritizing potential future users who are willing to start using a method without bothering to read its documentation over current existing codes and users. To me that feels like optimizing for the wrong profile

It’s not just future users though, better naming helps our current users too.

To some extent I'd be happy to defer to your judgement on the matter (you have a better sense of who is using Chapel and how they're using it).

But I’d also say these matters aren’t simply choosing whether we "prioritize group A or group B"; the math is more complicated than that -- we need to consider how much of a cost it is to group A and how much it would benefit group B.

If changing the name was a trivial cost for "current existing codes and users" but a huge help to "potential future users who are willing to start using a method without bothering to read its documentation" then isn't it still worthwhile?

Now, for this issue, is it really a "trivial cost" for group A and a "huge help" for group B. Well, no, probably not. Maybe it's a slight cost for group A to give a moderate amount of help to group B. So, how do we prioritize in this scenario? I think that's more debatable.

I'd also caution against being too disparaging of "users who don't read documentation". I make a point of reading documentation; I'd tell anyone I was mentoring to "read the docs"; I mutter "RTFM" from time-to-time; and I'd love to live in a world where we could rely on everyone to dutifully read the instructions before they start using something, but that's not reality. People jump to conclusions, they copy-and-paste code from StackOverflow, they copy-and-paste code from their own codebase, they pull up lists of functions in their IDE and pick from there, and even when they do read the documentation: they'll skim and miss important details or forget them later.

So, it's important to come up with names that won't be misunderstood. Should that be balanced against trying to keep names (especially for commonly used functions) short and to-the-point. Sure. Absolutely. But is id really that commonly used? Is typing “idx” really that hard?

In addition, Michael's points about best practices being to compare or hash on locale values, rather than locale IDs, completely resonates with me.

Absolutely. I agree. That's exactly what users should do. They'll be more likely to do the right thing if we don't have a field named .id on the object tempting them to do the wrong thing.

There's arguably a similar question around locale.hostname: GPUs probably don't have a hostname of their own on most systems, so should they return a new bogus hostname, generate an error, or return their parent's hostname

Then it sounds like having hostname on locale isn't the right place for it to be. We had an idea of having locale return something like a hwSpec object that could be downcasted to a more specific type (say CPUHwSpec or GPUHwSpec), where a function like hostname that's only CPU specific could be only implemented in the CPUHwSpec class.

Responding to Jeremiah:

For this proposal, the analogous documentation entry might look something like:

The .id query on a locale returns the locale's ID as defined by its index within the Locales array. In the case of a sub-locale, it returns a globally unique integer computed as some function of its parent's id and the number of sub-locales on each node.

So, make .id really be a unique id. I almost want to agree to this just we can move on. But I don't think I can honestly say that's the right thing either.

As alluded to earlier it would be better practice to add hash and compare functions to locales so they could directly be used in a map than to come up with and use a "return a unique ID" function. If we don't have an id function, people won't misuse it for this purpose.

When faced with a function that has a "misleading name" it seems like the more sensible thing to do is to just update the name instead of adding functionality to make it conform with that misunderstanding.

If we do go with proposal 1b, I think I'd rather have it do a runtime error when used on a non-CPU locale. That would still be less than ideal in my mind because "compile time errors are preferable to runtime errors" and seems like a violation of the “substitution principle” for OO design but I think it's preferable.

lydia-duncan commented 1 year ago

I think it is important to point out that users are more likely to jump to the wrong conclusion and skip steps if they think they know the answer. If something looks simple and straight-forward but actually isn't, we're helping their understanding by making it a little more obvious it isn't simple. A change to the id name will mean users that already exist have to update their code, but maybe doing so will mean they understand a little bit more about the interaction between locales and GPUs. Then the calculus becomes more complicated - they aren't just hurt by having to update their code, they're learning something new from it that will make them a better programmer in a heterogeneous hardware world

bradcray commented 1 year ago

If changing the name was a trivial cost for "current existing codes and users" but a huge help to "potential future users who are willing to start using a method without bothering to read its documentation" then isn't it still worthwhile?

I think there's another hidden cost here that you're not accounting for, potentially because I haven't articulated it (or articulated it well). Every time we go to users and say "We've changed this features, and you need to update your code to adapt to it," we get a reaction that lands somewhere between "Oh thank goodness, at last you've fixed a long-term source of pain and frustration for me" to "You've got to be kidding... Do you really have nothing better to improve? Or think I have nothing better to do than to make these kinds of adjustments?" And each time we get a response that's more on the latter side of the spectrum (and we have), we're giving people reasons to be frustrated with Chapel and to give up on it rather than to stick with it. For me, I'm not convinced that .id is so broken, bad, or wrong relative to other changes we've been discussing and making that it's worth risking any political capital on. And I say that from the perspective of having been very open to changing it initially, and only becoming less and less convinced that the change is necessary or valuable as the conversation has progressed (to the point that even if I could time machine back to Chapel 0.0 with all the information in this thread, I'm not convinced that I'd actually take a different approach than what I've proposed in the past few comments).

If we wanted to get a read on whether I'm wrong about how users would feel about it, then we should summarize this conversation on Discourse and/or meet with key users to get a read from them. But that should be done in a way that each position felt like their case was stated fairly, so would take time and effort. And, personally, I think it would take more time than I think this specific topic is worth.

In any case, I think the next step should be for Jeremiah to open the internal straw poll he proposed.

stonea commented 1 year ago

I think there's another hidden cost here that you're not accounting for, potentially because I haven't articulated it (or articulated it well). Every time we go to users and say "We've changed this features, and you need to update your code to adapt to it," we get a reaction that lands somewhere between "Oh thank goodness, at last you've fixed a long-term source of pain and frustration for me" to "You've got to be kidding... Do you really have nothing better to improve? Or think I have nothing better to do than to make these kinds of adjustments?" And each time we get a response that's more on the latter side of the spectrum (and we have), we're giving people reasons to be frustrated with Chapel and to give up on it rather than to stick with it. For me, I'm not convinced that .id is so broken, bad, or wrong relative to other changes we've been discussing and making that it's worth risking any political capital on. And I say that from the perspective of having been very open to changing it initially, and only becoming less and less convinced that the change is necessary or valuable as the conversation has progressed (to the point that even if I could time machine back to Chapel 0.0 with all the information in this thread, I'm not convinced that I'd actually take a different approach than what I've proposed in the past few comments).

Sure. So what you're saying is that in this case the "cost" is not trivial (or at-least, not trivial when taking into consideration the fact that our users already feel frustrated from past, similar, naming updates). Also you think (or at-least highly suspect) there's no benefit (you wouldn't change your mind even in a Chapel 0.0 world).

I'm perfectly happy to defer to your judgment on the "cost": I haven't been in the room to hear this feedback and from my perspective what I've seen is other, seemingly innocuous, naming changes make it through. If you're suggesting our team take a more conservative approach to naming updates, then that's a direction I'm happy to take.

I disagree with your design judgment but I won't argue it to death and if what you're saying is "the cost is not trivial" then rehashing the design argument seems like going into "how many angels can dance on a pinhead" territory. I'm curious what others think (though mostly because I want to know if my design sensibilities are really that "off")

If we wanted to get a read on whether I'm wrong about how users would feel about it, then we should summarize this conversation on Discourse and/or meet with key users to get a read from them. But that should be done in a way that each position felt like their case was stated fairly, so would take time and effort. And, personally, I think it would take more time than I think this specific topic is worth.

I agree. We should avoid offloading our decision making process onto our end-users except in truly exceptional cases.

jeremiah-corrado commented 1 year ago

Current Status of Locale.id:

This program:

writeln(Locales[3].id);
writeln(Locales[3].gpus[1].id);

Prints:

3
3

Proposals for Stabilizing:

Please vote for one or more of the following:

(I know the numbering isn't super logical, but I didn't want to diverge too much from the original table here: https://github.com/chapel-lang/chapel/issues/20217#issuecomment-1249455507)

Proposal 1 - 👍

Proposal 1.b - 😄

Proposal 2 - 👀

Proposal 3 - 🎉

Proposal 3.b - ❤️

Something else - 🚀

[^1]: For both Proposal 1s, we can also discuss whether id should throw a runtime exception when called on a GPU sub-locale. I think this can happen after landing on one of those proposals though.

stonea commented 1 year ago

I wouldn't say there hasn't been much interest in Proposal 2. After hearing Brad's comment about user concerns I'll forgo voting for it out of pragmatic concerns and to keep us from being deadlocked. But this thread is now over 40 comments long I'm not sure if everyone coming to the vote has considered the design implications of having an id function that's not unique. I still feel proposal 2 (rename id is the "right" thing to do from a more design-oriented perspective).

If the decision is that we're not renaming 'id' or changing its behavior for CPU locales then I don't think there's anything left to discuss from a Chapel 2.0 perspective.

I realize I'm contradicting my statement about "rehashing the design argument" (I guess I can't help myself). But let me give one last example that will introduce a bug if the user mistakingly believes id to be unique when it's not (and maybe this will help folks decide if they prefer choice 1 or choice 3):

proc main() {
  coforall loc in Locales do on loc {
    doAndReportComputation();
  }
}

proc doAndReportComputation();
  const result = doComputation();
  saveResultToFile(result, "results_" + here.id : string + ".dat");
}

This is a totally reasonable thing to do if you think id means "unique id". And it works fine until one day the user comes to update their main function to also operate on GPUs and suddenly now they have a bug.

proc main() {
  coforall loc in Locales do on loc {
    doAndReportComputation();
    coforall gpu in here.gpus() do on gpu {
      doAndReportComputation();
    }
  }
}

If functionality works for the basetype of some object then it should also work for its derived types.

stonea commented 1 year ago

If we're throwing out proposal 2. My preference would be to keep id as is and have it do a runtime error if invoked from a GPU locale. (hence the rocket ship)

mstrout commented 1 year ago

I voted for Proposal 3.b, "Modify id's behavior for GPU (i.e. any in my mind) sub-locales to return a unique integer". I did that because the term identifier has in my experience implied a unique identifier and the behavior of both these expressions evaluating to the same thing goes against that connotation.

writeln(Locales[3].gpus[1].id);

More importantly, GPUs are sublocales that have separate memory. It seems like providing a unique locale.id for each GPU will be needed to implement and use distributed arrays.

However, I do have some questions about Proposal 3.b:

jeremiah-corrado commented 1 year ago

@stonea I just updated the voting to include proposal 2 explicitly

bradcray commented 1 year ago

Rationalizing why I voted for 1.b rather than 1:

jeremiah-corrado commented 1 year ago

What guarantees, if any, would we want to make about the ids for all the GPU sublocales associated with one locale? For example, if we say they are contiguous, then that might make implementing array indexing easier.

I'm thinking this should be implemented s.t. the base-locale id's are still [0-numLocales-1]. And then starting with locale0, we would number off all the gpu's on each locale.

So if locale0 had four GPUs, their .ids would be [numLocales-numLocales+3]. And then we'd number off Locale1's GPUs starting at numLocales+4 and so on.

There are other ways to do this as well, but I'm thinking this topic warrants a separate design discussion.

And it might be good not to confine ourselves to a particular numbering scheme (i.e. don't put the above description in the documentation), so that we could freely change it in the future for more complex locale hierarchies with multiple numa domains sharing/not-sharing GPUs, etc.)

stonea commented 1 year ago

Doing 1.b gives us time to continue to gain experience with GPU programming to determine what more we need for it (and what we want to call it)

Yeah, I think this is right.

My rocketship comment has more do with "what I think should happen with GPU support" rather than what's right from a strictly Chapel 2.0 perspective (although I think the anticipated direction we'd take for GPU support might influence some folks vote).

bradcray commented 1 year ago

It seems like providing a unique locale.id for each GPU will be needed to implement and use distributed arrays.

I don't think this is true. Our distributions rarely use GPU IDs to govern distributions since most of them accept a targetLocales array that can have any subset of the Locales array. As a result, we usually compute the distribution in terms of logical indices within the Locales array or the locale value itself.

What is the locale and sublocale relationship going to be when we have one locale per NUMA domain or per NIC? Is there going to be a clear idea of which GPUs should be sublocales of which locales?

Generally speaking, I believe we're imagining that when there are n locales on a node, numGPUs/n will belong to each of them. I think we're currently hoping, optimistically, that that division will always come out evenly (e.g., if there are two sockets and two NICs, we'd hope there wouldn't be an odd number of GPUs).

Similarly, I think it'd be a reasonable starting assumption that all nodes have the same number of GPUs, but I think we should try to avoid building this assumption into the implementation as much as possible, just in case someone has a more heterogeneous system.