codeplaysoftware / standards-proposals

Repository for publicly sharing proposals in various standards groups
Apache License 2.0
27 stars 17 forks source link

CP013: Updated wording for bulk_execution_affinity properties (P1436) #111

Closed AerialMantis closed 4 years ago

AerialMantis commented 4 years ago

One comment that was made in Belfast was that the naming of the properties reflects an older revision of OpenMP, so one of the first things I propose for this paper is to update the naming to that of OpenMP 5.0.

This would mean:

_bulk_execution_affinity.none (remane the same) bulk_execution_affinity.scatter -> bulk_execution_affinity.spread bulk_execution_affinity.compact -> bulk_execution_affinity.close bulk_executionaffinity.balanced (remane the same)

I also wanted to clarify the meaning of the concurrency property in P1436r2, particularly as I think this could be relevant to the wording of the bulk_execution_affinity properties. The intention is that it represents the maximum available concurrent execution agents available to an executor when used in a single invocation of execution::bulk_execute. This does not guarantee that these execution agents will always be created with concurrent forward progress and it also assumes that the execution resources are uncontested by other executors or third party libraries. One concern with this definition that we may want to address is that it does not allow any control over the domain or level of the hierarchy it is applied to, so you cannot use this property for nested calls to execution::bulk_execute with different affinity binding as you would in say OpenMP, so this is perhaps something we want to address.

For the wording of the bulk_execution_affinity properties, I have drafted initial wording based on the discussions in Belfast (I hope I accurately captured the direction we were going in). The basis of this wording is the assumption that an invocation of execution::bulk_execute(e, f, s) creates a consecutive sequence of work-items from 0 to s-1, mapped to the available concurrency, that is some number of execution resources, which are subdivided in some implementation-defined way.

_Property Wording
bulk_execution_affinity.none A call to execution::bulk_execute(e, f, s) is not required to bind the created execution agents for the work-items of the iteration space specified by s to execution resources.
bulk_execution_affinity.close A call to execution::bulk_execute(e, f, s) should aim to bind the created execution agents for the work-items of the iteration space specified by s to execution resources such that the average locality distance between adjacent work-items is minimized. Only binding subsequent execution agents to a resource if no other resources would otherwise result in fewer execution agents being bound to it.
bulk_execution_affinity.spread A call to execution::bulk_execute(e, f, s) should aim to bind the created execution agents for the work-items of the iteration space specified by s to execution resources such that the average locality distance of adjacent work-items in the same subdivision of the available concurrency is maximized and the average locality distance of adjacent work-items in different subdivisions of the available concurrency is maximized. Only binding subsequent execution agents to a resource if no other resources would otherwise result in fewer execution agents being bound to it.
bulk_execution_affinity.balanced A call to execution::bulk_execute(e, f, s) should aim to bind the created execution agents for the work-items of the iteration space specified by s to execution resources such that the average locality distance of adjacent work-items in the same subdivision of the available concurrency is minimized and the average locality distance of adjacent work-items in different subdivisions of the available concurrency is maximized. Only binding subsequent execution agents to a resource if no other resources would otherwise result in fewer execution agents being bound to it.

Note: the subdivision of the available concurrency is implementation-defined.

Note: when the number of work-items is greater than the available concurrency, the binding should wrap following the same subdivision._

We may want to reconsider the terms "concurrency" and "locality distance" in the above wording, another suggestion during the SG1 session was to incorporate the idea of "interference", used in the existing hardware_[constructive|destructive]_interferance queries.

Additionally, the current behaviour when the number of work-items is greater than the available concurrency the binding should wrap, however, we may wish to define further properties for alternative chunking patterns.

This proposed wording was also sent to the SG1 mailing list to start a discussion there.

AerialMantis commented 4 years ago

I've created a pull request with this updated wording, altered based on the discussion in the last heterogeneous C++ telecom - https://github.com/codeplaysoftware/standards-proposals/pull/112

AerialMantis commented 4 years ago

PR was merged so closing this issue.