KhronosGroup / OpenCL-Docs

OpenCL API, OpenCL C, Extensions, SPIR-V Environment Specs, Ref page, and C++ for OpenCL doc sources.
Other
358 stars 112 forks source link

Update doc for `work_group_broadcast` #169

Open mantognini opened 4 years ago

mantognini commented 4 years ago

The doc for work_group_broadcast is currently this: https://github.com/KhronosGroup/OpenCL-Docs/blob/b86f597faf4829234b68fa72c7f33fa85d8c0566/OpenCL_C.txt#L8315-L8324

There's a small typo: it should Broadcast the value of a, not x.

I would suggest renaming local_id to local_id_x for explicitness.

I would also suggest stating that calling {first, second, third} overload from a NDRange with a dimension that is not {1D, 2D, 3D} (respectively) results in undefined behaviour.

It might also be worth, for completeness, mentioning that the local id should refer to an "existing" and "valid" id (i.e. to ensure that at lest one value can be returned by this builtin). Those are probably not the right words, but it should give the idea.

Together, these two last suggestions ensure one and only one value can be returned by this builtin.

I believe this has no impact on existing CTS tests as it only make undefined behaviour explicit and doesn't change existing expectations.


The reasoning is the following:

bashbaug commented 4 years ago

Sorry for the slow reply, I think this issue came in around the US Thanksgiving holidays and I missed it.

I would also suggest stating that calling {first, second, third} overload from a NDRange with a dimension that is not {1D, 2D, 3D} (respectively) results in undefined behaviour.

Does it make sense instead to say that any "missing" local ID coordinates default to zero? Otherwise I think it's very difficult to write a single kernel that works for multiple dimensions. Remember that local IDs are zero-based, so zero should be "valid" for any dimension.

To summarize, the suggested changes are:

  1. Fix the typo either by changing the description to a or the argument to x.
  2. Clarify that broadcasting from a non-existent work item is undefined.
  3. Figure out what should happen when the dimensionality of a work group does not match the number of local ID coordinates passed to the broadcast function, and document this behavior.

Sound good? I think (1) and (2) are straightforward in case it takes time to resolve (3).

mantognini commented 4 years ago

Does it make sense instead to say that any "missing" local ID coordinates default to zero? Otherwise I think it's very difficult to write a single kernel that works for multiple dimensions.

I agree it's difficult to write such kernel at the moment. Maybe the API should be improved to make this easier. Defaulting to zero would clarify the intent, but not necessarily make the user code simpler -- it would hide bugs actually. If this is left undefined, theoretically a debug build/runtime could let the user know about their mistake. I feel we should not commit to that behaviour (i.e. defaulting to zero).

bashbaug commented 3 years ago

Regarding:

Clarify that broadcasting from a non-existent work item is undefined.

The same clarification should also be made for the sub-group broadcast.

AnastasiaStulova commented 3 years ago

Conclusions from teleconference on Feb 11, 2021

Fix the typo either by changing the description to a or the argument to x.

Agreed.

Clarify that broadcasting from a non-existent work item is undefined.

Agreed. The update should also cover subgroup functions.

Figure out what should happen when the dimensionality of a work group does not match the number of local ID coordinates passed to the broadcast function, and document this behavior.

Undefined behavior is the only viable option to avoid breaking existing implementations.

StuartDBrady commented 3 years ago
  1. Fix the typo either by changing the description to a or the argument to x.

I have raised PR #564 for this change.

  1. Clarify that broadcasting from a non-existent work item is undefined.

Did we determine whether this is undefined behavior or returns an undefined value?

  1. Figure out what should happen when the dimensionality of a work group does not match the number of local ID coordinates passed to the broadcast function, and document this behavior.

I would propose that this should be undefined behavior, as we cannot guarantee what existing implementations do in this case. Ideally this would align with the OpGroupBroadcast in SPIR-V, which does mention undefined behavior but just not for this specific case. Do we think this same change can be made for the SPIR-V spec?

AnastasiaStulova commented 3 years ago
  1. Clarify that broadcasting from a non-existent work item is undefined.

Did we determine whether this is undefined behavior or returns an undefined value?

Yes, I assumed it means undefined value, @mantognini and @bashbaug did you think the same?

bashbaug commented 3 years ago

Yes, I assumed it means undefined value

I've gone back and forth on this one. "Undefined value" matches the behavior of the non-uniform subgroup broadcast, but I think it could require a runtime check for some implementations that do the broadcast through memory that would not be required for "undefined behavior".

In the non-uniform subgroup broadcast case the behavior is described in the SPIR-V spec (see OpGroupNonUniformBroadcast). There is no similar behavior described for the (uniform) subgroup broadcast or the work-group broadcast though (see OpGroupBroadcast). This means we should also describe whatever we decide for the out-of-bounds behavior in the OpenCL SPIR-V Environment Spec, and not just in the OpenCL C spec.

mantognini commented 3 years ago

Generally speaking, I would recommend using "undefined behaviour" in the specification whenever possible because:

  1. It's weaker, and therefore can be mapped to SPIR-V while allowing relaxation of SPIR-V specification in the future without conflicting with the OpenCL C specification. (As a side point, it also doesn't bind an implementation to the SPIR-V spec too much, if that's both desired and possible.)
  2. From the implementation point of view, this greatly simplifies things and allow more optimal implementation for the "normal" case (i.e. no need to have code to validate inputs, etc...). It therefore follows the zero-overhead principle. Imagining a device which has the weakest kind of registers (i.e. they do not support concurrent read/write accesses), this can be significant overhead.
  3. From the end-user point of view, code resulting in either undefined behaviour or value is a bug that needs to be addressed. So there's little value of having a stronger guarantee in the specification.

(This is not specific to this issue, but to all use of "undefined value" in the current spec.)

StuartDBrady commented 3 years ago

Generally speaking, I would recommend using "undefined behaviour" in the specification whenever possible because: [...]

Provided we have consistency between the OpenCL C specification and the OpenCL SPIR-V Environment specification, this seems fine to me.

Generally speaking, though, it is useful to be able to do bidirectional translation between SPIR-V and LLVM IR that uses OpenCL C builtin functions. Sure, undefined behavior in OpenCL can be safely translated into the use of undefined values (including poison values) in SPIR-V. However, it's not so great if we can't translate back without generating extra guards.

bashbaug commented 3 years ago

Generally speaking, I would recommend using "undefined behaviour" in the specification whenever possible...

I generally agree - if we don't have a good reason to go with "undefined value" we should break the tie towards "undefined behavior". Coming back to this issue: Does that mean that an out-of-bounds index passed to work_group_broadcast should be "undefined behavior", and not "undefined value" as suggested above?

mantognini commented 3 years ago

Does that mean that an out-of-bounds index passed to work_group_broadcast should be "undefined behavior", and not "undefined value" as suggested above?

In this particular case, thinking a bit more about it, I'm not sure we have a choice and we must use "undefined behaviour". Here is my reasoning. Because the definition was so far

Broadcast the value for work-item identified by local_id to all work-items in the work-group.

and we also have

local_id must be the same value for all work-items in the work-group.

if the index, which must be the same for all work-item, is out-of-bound then the contract that the caller must respect isn't fulfilled given there is no known value to broadcast.

Does that sound correct?

bashbaug commented 3 years ago

Does that sound correct?

Works for me! @mantognini, are you able to create a PR that adds the "undefined behavior" clarification? If not, that's fine, just don't want to duplicate work.

Did we decide what the behavior should be when the dimensionality of a work group does not match the number of local ID coordinates passed to the broadcast function?

mantognini commented 3 years ago

Works for me! @mantognini, are you able to create a PR that adds the "undefined behavior" clarification? If not, that's fine, just don't want to duplicate work.

Sure, no problem. I went ahead and created #577.

Did we decide what the behavior should be when the dimensionality of a work group does not match the number of local ID coordinates passed to the broadcast function?

I don't think there was any clear consensus but there was no objection either as far as I can tell. For this case, I suppose we could choose either "undefined value" or "undefined behaviour". As stated above, my personal preference would be the latter but I'm happy to hear other opinions before drafting a change.

bashbaug commented 3 years ago

Discussion summary from the March 11th tooling call:

We still need to decide what to do if the number of parameters to work_group_broadcast doesn't match the dimensionality of the ND-range. The safe thing to do is "undefined behavior" but this obviously not an ideal resolution.

How we decide this issue could also affect the SPIR-V environment specification, since the "LocalId" source to OpGroupBroadcast can be a scalar or a vector with two or three components.