KhronosGroup / SYCL-Docs

SYCL Open Source Specification
Other
116 stars 68 forks source link

Can "item" and "nd_item" be shared across work-items? #532

Open gmlueck opened 10 months ago

gmlueck commented 10 months ago

This question recently came up from our implementation team. What is the precise semantics of the member functions on item and nd_item? Do they return the index of the calling work-item or do they return the index of the work-item that created the object?

The question is relevant only in the weird case where an item or nd_item object from one work-item is copied into SLM and then referenced by another work-item. This could be done, for example, by using placement-new to copy the item or nd_item object.

Note that id may have a different semantic because it is user-constructible. The semantics of id seem clear. The member functions return the information that was used to construct the id object. Therefore, if work-item A copies its id object to SLM and then another work-item B calls its member functions, I presume work-item B would get the index of work-item A.

AlexeySachkov commented 10 months ago

Linking #451 here. sub_group is probably different enough to be handled differently compared to item and nd_item, but we had the same question about it as well.

illuhad commented 10 months ago

I don't think these classes are meaningful outside of their own work item, because they generally carry the semantics of "give me the position of this item in the parallel iteration space". Using objects from a different work item should be illegal and/or UB.

If one did this, even just within one implementation the behavior might not be consistent. For example, whether AdaptiveCpp stores the position in the iteration space inside nd_item/item or whether it queries this on demand using builtins changes depending on compilation flow and configuration.

gmlueck commented 10 months ago

I think this makes sense also. I was wondering why we didn't define all the member functions of nd_item and item as static. That would make it clear that they only return information about the calling work-item, since static member functions don't have access to any state in the object. Did we decide not to define them as static to give implementations the freedom to store information about the calling work-item in the object?

In any case, I think we could add text to the introduction for each of these objects something like:

The implementation constructs an item object for each work-item when a kernel is submitted with a range iteration space. That object (and copies of that object) may be used only by the work-item that "owns" it. Although it is possible to copy the object into memory that is accessible to other work-items or to the host, calls to member functions of the object from other work-items or from the host produce undefined behavior.

I think we could use similar language for sub_group and group also.

Pennycook commented 10 months ago

I think this makes sense also. I was wondering why we didn't define all the member functions of nd_item and item as static. That would make it clear that they only return information about the calling work-item, since static member functions don't have access to any state in the object. Did we decide not to define them as static to give implementations the freedom to store information about the calling work-item in the object?

I agree that freedom of implementation is important. But I think it may also be important to note that if the functions were static, a developer wouldn't need an instance of an nd_item to call them. That would mean that developers could call nd_item member functions outside of kernels, call an nd_item member function using a class with the wrong number of dimensions, etc.

tomdeakin commented 10 months ago

Discussed in WG meeting 01/25/24. Next steps: introduce a new section similar to https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:reference-semantics to group these restrictions together. Or clarify these directly. Clarification to SYCL 2020.

bader commented 9 months ago

FYI. I committed stateless implementation of nd_item to DPC++ compiler. This commit exposed SYCL-CTS tests relying on stateful implementation or wrong uses of nd_item methods on the host.

  1. nd_item_equality test validates one of the rules for "by-value semantic" - https://registry.khronos.org/SYCL/specs/sycl-2020/html/sycl-2020.html#sec:byval-semantics. The test allocates a buffer to store 2 nd_item objects, run a kernel which stores two nd_items from different work-items and finally uses comparison operators to check nd_items to check following rule:

T must be equality comparable on the host application (in the case where T is available on the host) and within SYCL kernel functions. Equality between two instances of T (i.e. a == b) must be true if the value of all members are equal and non-equality between two instances of T (i.e. a != b) must be true if the value of any members are not equal, unless either instance has become invalidated by a move operation. By extension of the requirements above, equality on T must guarantee to be reflexive (i.e. a == a), symmetric (i.e. a == b implies b == a and a != b implies b != a) and transitive (i.e. a == b && b == c implies c == a).

As it was agreed above sharing and comparing stateless nd_item objects doesn't make sense and I'd like to drop this test.

  1. nd_item_constructors compares different nd_item objects using more reliable technique. It saves the "state" of nd_item object by obtaining and saving ranges/ids via calling methods like get_local_id(). Unfortunately, it does it on the host too!

https://github.com/KhronosGroup/SYCL-CTS/pull/869 updates both tests.