Closed tdd11235813 closed 6 years ago
I probably like 'acc' a little better than 'device', as generally it feels broader and used already in alpaka.
However, I like having ALPAKA_FN
as a kinda default qualifier for kernels.
Anyways, I think in your proposal it would be more consistent to rename ALPAKA_FN_HOST_ACC
to ALPAKA_FN_HOST_DEVICE
.
We use acc
/accelerator already for the backend namings in Alpaka. We could double-use it, not sure if it adds or removes confusion.
"Device" naming would be consistent with how we call "devices" in PIConGPU (as reference to GPUs and CPU-sockets).
Using ALPAKA_FN_DEVICE
but ALPAKA_FN_HOST_ACC
is inconsistent, though.
yes, the inconsistency is a mistake.
I now would suggest this version:
#if BOOST_LANG_CUDA /* || <more languages> */
#define ALPAKA_FN_ACC_ONLY __device__
#if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)
#define ALPAKA_FN __device__
#else
#define ALPAKA_FN __device__ __host__
#endif
#define ALPAKA_FN_HOST_ACC __device__ __host__
#define ALPAKA_FN_HOST_ONLY __host__
#define ALPAKA_FN_ACC_ONLY __device__
does not seem to be correct. Which accelerator? All? Only CPU? Only CUDA? I already thought about removing ALPAKA_FN_ACC_CUDA_ONLY
completely because it is only an implementation detail that should not be used by an end-user and has to be guarded by #if BOOST_LANG_CUDA
. We could replace all of it's usages directly with __device__
but then it feels inconsistent to me again. But maybe better a bit-inconsistent than confusing.
The same is true for ALPAKA_FN_ACC_NO_CUDA
which you most probably replaced by ALPAKA_FN_HOST
everywhere. This would be very confusing because host means the host CPU and ACC means an accelerator (which could be the CPU itself but must not be). ALPAKA_FN_ACC_NO_CUDA
is used for methods that are used by CPU accelerators. ALPAKA_FN_HOST
is used for methods that can not be used by an accelerator but only on the host CPU. Even though they resolve to the same (__host__
) they are semantically different. We could directly use __host__
everywhere where those macros are used because they are again only implementation details.
__device__
:
__host__
:
__host__
annotation but is a nice hint. π¬ __device__
or __device__ __host__
:
__device__ __host__
:
__host__
:
__host__
annotation but is a nice hint. π¬ Yes, I see that ALPAKA_STATIC_DEV_MEM_GLOBAL
and ALPAKA_STATIC_DEV_MEM_CONSTANT
are inconsistent. They should most probably be renamed to ALPAKA_STATIC_ACC_MEM_GLOBAL
and ALPAKA_STATIC_ACC_MEM_CONSTANT
I would propose the following:
ALPAKA_STATIC_DEV_MEM_GLOBAL
and ALPAKA_STATIC_DEV_MEM_CONSTANT
~ (done)ALPAKA_FN_ACC_CUDA_ONLY
-> ALPAKA_FN_ACC_CUDA
, ALPAKA_FN_ACC_NO_CUDA
-> ALPAKA_FN_ACC_CPU
ALPAKA_INTERNAL_FN_HOST
or ALPAKA_DETAIL_FN_HOST
Thanks for the clarification on your definitions. I will come back to the host-device scheme. However, I cannot follow at the moment, why an internal layer for the defines is necessary.
For the definitions: A device/accelerator also can be a CPU, e.g., even when HIP(HCC) is used (so an _ACC_CPU
could also be misleading).
This concept host-* is different when we look at the compile stages, for which these macros are required at all, right? But the naming does not really reflect this, but mixes. Not easy to find an unambiguous scheme though.
In #457 there already has been a discussion about the namings, where "_STAGE" and "_OFFLOADING_ONLY" have been proposed, but are still confusing and not exact.
My current thoughts are:
(Edit: do not read this, it is mixed up regarding ALPAKA_ACC_GPU_CUDA_ONLY_MODE
and ALPAKA_FN_ACC_CUDA
)
ALPAKA_FN_ACC_CUDA
or even ALPAKA_FN_CUDA
__popc()
)ALPAKA_FN_HIP
, ALPAKA_FN_SYCL
, ALPAKA_FN_TBB
?ALPAKA_FN_ACC_CUDA
and the like are somehow misleadingALPAKA_FN_ACC_CUDA
? ALPAKA_FN_DEVICE_ONLY
and BOOST_LANG_CUDA
ifdef switches for backend specific code?ALPAKA_FN_ACC_CUDA
is relevant to handle ALPAKA_ACC_GPU_CUDA_ONLY_MODE
, which is invented to use native CUDA code in kernels, which also can be solved by the aforementioned method, right?ALPAKA_FN_DEVICE_ONLY_MODE
is proposedALPAKA_FN
, ALPAKA_FN_HOST_ONLY
, ALPAKA_FN_DEVICE_ONLY
ALPAKA_FN
caseProposing:
#if BOOST_LANG_CUDA /* || <more languages which use two stage compiling> */
#define ALPAKA_FN_DEVICE_ONLY __device__
#if defined(ALPAKA_FN_DEVICE_ONLY_MODE) /* if it is needed at all */
#define ALPAKA_FN __device__
#else
#define ALPAKA_FN __device__ __host__
#endif
#define ALPAKA_FN_HOST_DEVICE __device__ __host__
#define ALPAKA_FN_HOST_ONLY __host__
ALPAKA_FN_ACC_CUDA
is relevant to handleALPAKA_ACC_GPU_CUDA_ONLY_MODE
ALPAKA_FN_ACC_CUDA
has nothing to do with ALPAKA_ACC_GPU_CUDA_ONLY_MODE
. ALPAKA_FN_ACC_CUDA
is only used for internal methods of the CUDA accelerator.
When ALPAKA_ACC_GPU_CUDA_ONLY_MODE
is set, the definition of ALPAKA_FN_ACC
is changed so that user defined accelerator methods are __device__
only.
which is invented to use native CUDA code in kernels, which also can be solved by the aforementioned method, right?
Not only. It is also used to make alpaka accelerator methods be callable from native CUDA code.
k, thanks, but I am still not happy with this ;)
ALPAKA_FN_ACC_NO_CUDA
->ALPAKA_FN_ACC_CPU
is misleading to me, as actually it means: ALPAKA_FN_ACC_NO_CUDA
(no device code) -> ALPAKA_FN_HOST_ONLY
Think of HIP
instead of CUDA
, and where devices can be CPU. The naming mixes platforms, back-ends and compiler stages.
I still think, the proposed scheme above would work.
#if BOOST_LANG_CUDA /* || <more languages which use two stage compiling> */
#define ALPAKA_FN_DEVICE_ONLY __device__
#if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) \
|| defined(ALPAKA_ACC_HIP_ONLY_MODE)
#define ALPAKA_FN __device__
#else
#define ALPAKA_FN __device__ __host__
#endif
#define ALPAKA_FN_HOST_DEVICE __device__ __host__
#define ALPAKA_FN_HOST_ONLY __host__
Would require:
ALPAKA_FN_ACC_NO_CUDA
-> ALPAKA_FN_HOST_ONLY
ALPAKA_FN_ACC_CUDA_ONLY
-> ALPAKA_FN_DEVICE_ONLY
I am still not convinced. All 3 internal macros, ALPAKA_FN_ACC_CUDA_ONLY
, ALPAKA_FN_ACC_NO_CUDA
and ALPAKA_FN_HOST
should not be used outside of alpaka. They are only semantic hints. Mapping ALPAKA_FN_ACC_NO_CUDA
-> ALPAKA_FN_HOST_ONLY
does not bring us anything. This would only require more knowledge about CUDA and remove the semantic hint.
Those macros are meant to differentiate between an GPU accelerator implementation (ALPAKA_FN_ACC_CUDA_ONLY
), a CPU accelerator implementation (ALPAKA_FN_ACC_NO_CUDA
) and code that is only usable on the host and should not be called by any accelerator (ALPAKA_FN_HOST
).
If you really want less such macros, we could simply do the following replacement, because there is nothing dynamic about those 3 macros:
ALPAKA_FN_ACC_CUDA_ONLY
-> __device__
ALPAKA_FN_ACC_CUDA_ONLY
-> ``ALPAKA_FN_HOST
-> ``However, then we would lose the semantic hint.
Replacing ALPAKA_FN_ACC_CUDA_ONLY
-> ALPAKA_FN_DEVICE_ONLY
would again require more knowledge about CUDA from the reader which is not necessary. Then we could simply replace it by __device__
.
Those macros are meant to differentiate between an GPU accelerator implementation (ALPAKA_FN_ACC_CUDA_ONLY), a CPU accelerator implementation (ALPAKA_FN_ACC_NO_CUDA) and code that is only usable on the host and should not be called by any accelerator (ALPAKA_FN_HOST).
ok, this defines the ACC
in the ALPAKA_FN_ACC*
.
I see that ALPAKA_FN_ACC_CUDA_ONLY -> ALPAKA_FN_DEVICE_ONLY
can directly be replaced by __device__
, if this is used in places, where CUDA is enabled anyways. I (as a CUDA programmer) actually would prefer __device__
then, so I directly see what's happening and I don't have to think about another alpaka macro, and someone might wonder, why there is no such thing for other languages. But I see the use of a semantic hint by ALPAKA_FN_ACC_CUDA_ONLY
for those who are not familiar with __device__
.
For the following internal macro:
Mapping ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_HOST_ONLY does not bring us anything
One of the reasons why I started this issue was that:
ALPAKA_FN_ACC_NO_CUDA
void function() {}
invited me to write:
ALPAKA_FN_ACC_NO_CUDA
ALPAKA_FN_ACC_NO_HIP
void function() {}
It would work, if you only define one of the macros, but does not look nice. Thus, I wanted to get rid of the language-specific macro naming and focus on the two-staged compile process by only using ALPAKA_FN_HOST
, although not everyone knows what that could mean.
However, ALPAKA_FN_ACC_NO_CUDA
-> ALPAKA_FN_ACC_CPU
is not right, because NO_CUDA
does not imply CPU
accelerator.
It looks like we have to bite a bullet in each case.
What's the issue if we would waive this macro for __host__
, as in a two-stage compile process non-attributed functions are supposed to be __host__
anyway?
#if BOOST_LANG_CUDA
#define ALPAKA_FN_ACC_CUDA_ONLY __device__
#endif
#if BOOST_LANG_HIP
#define ALPAKA_FN_ACC_HIP_ONLY __device__
#endif
#if BOOST_LANG_CUDA || BOOST_LANG_HIP
#if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) \
|| defined(ALPAKA_ACC_GPU_HIP_ONLY_MODE)
#define ALPAKA_FN_ACC __device__
#else
#define ALPAKA_FN_ACC __device__ __host__
#endif
#define ALPAKA_FN_HOST_ACC __device__ __host__
After the latest changes we are down to:
#if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)
#define ALPAKA_FN_ACC __device__
#else
#define ALPAKA_FN_ACC __device__ __host__
#endif
#define ALPAKA_FN_HOST_ACC __device__ __host__
#define ALPAKA_FN_HOST __host__
So we have a equivalence mapping between CUDA and alpaka. We could go one step further and remove ALPAKA_FN_HOST completely because it is the default. Any more ideas?
Normally __host__
is not necessary. Looking at kokkos they have used it in TaskExec, when the object is not in device code, while one of the constructor is attributed with __device__
.
Maybe it is possible, that a non-attributed class can mistakenly derive a __device__
attribute, where a __host__
for that class would have been necessary to disallow that.
For now I would keep the ALPAKA_FN_HOST
because it makes porting CUDA to alpaka easier (simple replacement).
Do you see any more work necessary or can this ticket be closed for now?
ok, all fine with me, so closing this. Thanks again.
Current
ALPAKA_FN_*
macros:New version could be:
This is minimal and also allows supporting other languages without involving new macros like
ALPAKA_FN_ACC_NO_HIP
. What do you think?