Closed AerialMantis closed 4 years ago
I've created a pull request with this updated wording, altered based on the discussion in the last heterogeneous C++ telecom - https://github.com/codeplaysoftware/standards-proposals/pull/112
PR was merged so closing this issue.
One comment that was made in Belfast was that the naming of the properties reflects an older revision of OpenMP, so one of the first things I propose for this paper is to update the naming to that of OpenMP 5.0.
This would mean:
_bulk_execution_affinity.none (remane the same) bulk_execution_affinity.scatter -> bulk_execution_affinity.spread bulk_execution_affinity.compact -> bulk_execution_affinity.close bulk_executionaffinity.balanced (remane the same)
I also wanted to clarify the meaning of the concurrency property in P1436r2, particularly as I think this could be relevant to the wording of the bulk_execution_affinity properties. The intention is that it represents the maximum available concurrent execution agents available to an executor when used in a single invocation of execution::bulk_execute. This does not guarantee that these execution agents will always be created with concurrent forward progress and it also assumes that the execution resources are uncontested by other executors or third party libraries. One concern with this definition that we may want to address is that it does not allow any control over the domain or level of the hierarchy it is applied to, so you cannot use this property for nested calls to execution::bulk_execute with different affinity binding as you would in say OpenMP, so this is perhaps something we want to address.
For the wording of the bulk_execution_affinity properties, I have drafted initial wording based on the discussions in Belfast (I hope I accurately captured the direction we were going in). The basis of this wording is the assumption that an invocation of execution::bulk_execute(e, f, s) creates a consecutive sequence of work-items from 0 to s-1, mapped to the available concurrency, that is some number of execution resources, which are subdivided in some implementation-defined way.
Note: the subdivision of the available concurrency is implementation-defined.
Note: when the number of work-items is greater than the available concurrency, the binding should wrap following the same subdivision._
We may want to reconsider the terms "concurrency" and "locality distance" in the above wording, another suggestion during the SG1 session was to incorporate the idea of "interference", used in the existing hardware_[constructive|destructive]_interferance queries.
Additionally, the current behaviour when the number of work-items is greater than the available concurrency the binding should wrap, however, we may wish to define further properties for alternative chunking patterns.
This proposed wording was also sent to the SG1 mailing list to start a discussion there.