alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
337 stars 69 forks source link

Implement accelerator traits for single vs multi-threads per block #2263

Closed fwyzard closed 1 month ago

fwyzard commented 2 months ago

alpaka::isSingleThreadAccelerator<T> evaluates to true if the given type is an accelerator that supports only one thread per block, and to to false if it supports multiple threads per block.

alpaka::isMultiThreadAccelerator<T> evaluates to true if the given type is an accelerator that supports multiple threads per block, and to to false if it supports only one thread per block.

If the given type is not an accelerator, the constants are not defined.

Implement a unit test for alpaka::isSingleThreadAccelerator<T> and alpaka::isMultiThreadAccelerator<T>.

fwyzard commented 2 months ago

@psychocoderHPC I use these traits inside the grid and block strided loops, to avoid the thread-level loop if the accelerator supports only one loop.

Comments, and suggestions for better names are welcome :-)

fwyzard commented 2 months ago

The failing test has a few instances of

22: D:\a\alpaka\alpaka\test\unit\acc\src\AccTraitTest.cpp(29): FAILED:
22:   REQUIRE( devProps.m_blockThreadCountMax > 1 )
22: with expansion:
22:   1 > 1

Is there a way to see what back-end is giving problems with the check ?

fwyzard commented 2 months ago

Temporarily adding log messages to see what accelerators are failing.

fwyzard commented 1 month ago

AccCpuOmp2Threads reports a single thread per block on the Windows tests.

mehmetyusufoglu commented 1 month ago

@psychocoderHPC I use these traits inside the grid and block strided loops, to avoid the thread-level loop if the accelerator supports only one loop.

Comments, and suggestions for better names are welcome :-)

alpaka::isSingleThreadAccelerator<T> evaluates to true if the given type is an accelerator that supports only one thread per block, and to to false if it supports multiple threads per block.

alpaka::isMultiThreadAccelerator<T> evaluates to true if the given type is an accelerator that supports multiple threads per block, and to to false if it supports only one thread per block.

If the given type is not an accelerator, the constants are not defined.

Implement a unit test for alpaka::isSingleThreadAccelerator<T> and alpaka::isMultiThreadAccelerator<T>.

Only one of the 2 traits above is enough? One is the complement of the other?

fwyzard commented 1 month ago

Only one of the 2 traits above is enough? One is the complement of the other?

For an Accelerator, yes.

For an arbitrary type, no: a type can also not be an accelerator at all.

psychocoderHPC commented 1 month ago

Only one of the 2 traits above is enough? One is the complement of the other?

For an Accelerator, yes.

For an arbitrary type, no: a type can also not be an accelerator at all.

Could you please give an example where both traits are required? The trait documentation is saying that the traits are only defined for accelerators. I try to find an example where one is not the complement of the other.

psychocoderHPC commented 1 month ago

AccCpuOmp2Threads reports a single thread per block on the Windows tests.

https://github.com/alpaka-group/alpaka/blob/1b8146ef300259c1206a3a56341be3e64d96515d/include/alpaka/acc/AccCpuOmp2Threads.hpp#L131-L133 if the environment variable OMP_NUM_THREDAS is set to 1 what is sometimes the default for an OS this backend is limiting the number of threads per block.

This is interesting and something I had not in mind.

fwyzard commented 1 month ago

Only one of the 2 traits above is enough? One is the complement of the other?

For an Accelerator, yes. For an arbitrary type, no: a type can also not be an accelerator at all.

Could you please give an example where both traits are required? The trait documentation is saying that the traits are only defined for accelerators. I try to find an example where one is not the complement of the other.

You are right - after trying to use them in an example, I realise that they should be false for non-accelerator types, rather than undefined.

psychocoderHPC commented 1 month ago

You are right - after trying to use them in an example, I realise that they should be false for non-accelerator types, rather than undefined.

I expect that you update this PR and change the default to false instead of undefined. The question in the toom is still if both traits are useful or if a single trait is enough.

fwyzard commented 1 month ago

I expect that you update this PR and change the default to false instead of undefined.

Indeed, done.

The question in the room is still if both traits are useful or if a single trait is enough.

Now they are.

For example, if only isSingleThreadAccelerator is defined, and one wants to check for the other case, one needs to use:

if constexpr (alpaka::isAccelerator<TAcc> and not alpaka::isSingleThreadAccelerator<TAcc>) {
  ...
}

If both are defined, one can directly use

if constexpr (alpaka::isMultiThreadAccelerator<TAcc>) {
  ...
}

That said, either one can be derived from the other plus existing traits, so if there is a preference to have only one, I can remove isMultiThreadAccelerator.

fwyzard commented 1 month ago

Could you let me know

In any case I will squash the 3rd commit into the 1st.