Less and not language specific ALPAKA_FN_* macros

tdd11235813 commented 6 years ago

Current ALPAKA_FN_* macros:

#if BOOST_LANG_CUDA                                
    #define ALPAKA_FN_ACC_CUDA_ONLY __device__     
    #define ALPAKA_FN_ACC_NO_CUDA __host__         
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)     
        #define ALPAKA_FN_ACC __device__           
    #else                                          
        #define ALPAKA_FN_ACC __device__ __host__  
    #endif                                         
    #define ALPAKA_FN_HOST_ACC __device__ __host__ 
    #define ALPAKA_FN_HOST __host__

New version could be:

#if BOOST_LANG_CUDA /* || <more languages> */
    #define ALPAKA_FN_DEVICE __device__
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)
        #define ALPAKA_FN __device__
    #else
        #define ALPAKA_FN __device__ __host__ 
    #endif
    #define ALPAKA_FN_HOST_ACC __device__ __host__
    #define ALPAKA_FN_HOST __host__

This is minimal and also allows supporting other languages without involving new macros like ALPAKA_FN_ACC_NO_HIP. What do you think?

sbastrakov commented 6 years ago

I probably like 'acc' a little better than 'device', as generally it feels broader and used already in alpaka. However, I like having ALPAKA_FN as a kinda default qualifier for kernels.

Anyways, I think in your proposal it would be more consistent to rename ALPAKA_FN_HOST_ACC to ALPAKA_FN_HOST_DEVICE.

ax3l commented 6 years ago

We use acc/accelerator already for the backend namings in Alpaka. We could double-use it, not sure if it adds or removes confusion.

"Device" naming would be consistent with how we call "devices" in PIConGPU (as reference to GPUs and CPU-sockets).

Using ALPAKA_FN_DEVICE but ALPAKA_FN_HOST_ACC is inconsistent, though.

tdd11235813 commented 6 years ago

yes, the inconsistency is a mistake.

TL;DR: Here are the two options for the new layout, one with `DEVICE`: ```C #if BOOST_LANG_CUDA /* || */ #define ALPAKA_FN_DEVICE __device__ #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) #define ALPAKA_FN __device__ #else #define ALPAKA_FN __device__ __host__ #endif #define ALPAKA_FN_HOST_DEVICE __device__ __host__ #define ALPAKA_FN_HOST __host__ ``` ... and the other one with `ACC`: ```C #if BOOST_LANG_CUDA /* || */ #define ALPAKA_FN_ACC __device__ /* <- is confusing to previous version */ #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) #define ALPAKA_FN __device__ #else #define ALPAKA_FN __device__ __host__ #endif #define ALPAKA_FN_HOST_ACC __device__ __host__ #define ALPAKA_FN_HOST __host__ ``` As a CUDA programmer I am more familiar to the `host`-`device` naming scheme, instead of `host`-`acc`. On the other hand `acc` is the most used abbreviation in alpaka (exceptions are `ALPAKA_STATIC_DEV_MEM_GLOBAL` and `ALPAKA_STATIC_DEV_MEM_CONSTANT`). So, I would agree to the `host`-`acc` scheme, but take care of the change to the current version: - ALPAKA_FN_ACC_CUDA_ONLY -> ALPAKA_FN_ACC - ALPAKA_FN_ACC -> ALPAKA_FN This could be misleading and error-prone, when you still think in the previous naming scheme. So a third option could be to add `_ONLY`s.

I now would suggest this version:

#if BOOST_LANG_CUDA /* || <more languages> */
    #define ALPAKA_FN_ACC_ONLY __device__ 
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)
        #define ALPAKA_FN __device__
    #else
        #define ALPAKA_FN __device__ __host__ 
    #endif
    #define ALPAKA_FN_HOST_ACC __device__ __host__
    #define ALPAKA_FN_HOST_ONLY __host__

BenjaminW3 commented 6 years ago

#define ALPAKA_FN_ACC_ONLY __device__ does not seem to be correct. Which accelerator? All? Only CPU? Only CUDA? I already thought about removing ALPAKA_FN_ACC_CUDA_ONLY completely because it is only an implementation detail that should not be used by an end-user and has to be guarded by #if BOOST_LANG_CUDA. We could replace all of it's usages directly with __device__ but then it feels inconsistent to me again. But maybe better a bit-inconsistent than confusing.

BenjaminW3 commented 6 years ago

The same is true for ALPAKA_FN_ACC_NO_CUDA which you most probably replaced by ALPAKA_FN_HOST everywhere. This would be very confusing because host means the host CPU and ACC means an accelerator (which could be the CPU itself but must not be). ALPAKA_FN_ACC_NO_CUDA is used for methods that are used by CPU accelerators. ALPAKA_FN_HOST is used for methods that can not be used by an accelerator but only on the host CPU. Even though they resolve to the same (__host__) they are semantically different. We could directly use __host__ everywhere where those macros are used because they are again only implementation details.

BenjaminW3 commented 6 years ago

ALPAKA_FN_ACC_CUDA_ONLY __device__:
- Used for methods of the CUDA accelerator(s).
- Implementation detail 🌔
ALPAKA_FN_ACC_NO_CUDA __host__:
- Used for methods of the CPU accelerator(s)
- Implementation detail 🌔
- This is not really necessary due to the default __host__ annotation but is a nice hint. 🍬
ALPAKA_FN_ACC __device__ or __device__ __host__:
- Used for kernel methods
- This should be used by the end-user 🌞
ALPAKA_FN_HOST_ACC __device__ __host__:
- Used for methods which can be called from kernels and the host CPU
- This should be used by the end-user 🌞
ALPAKA_FN_HOST __host__:
- Used for methods which can only be called on the host CPU.
- Implementation detail 🌔
- This is not really necessary due to the default __host__ annotation but is a nice hint. 🍬

BenjaminW3 commented 6 years ago

Yes, I see that ALPAKA_STATIC_DEV_MEM_GLOBAL and ALPAKA_STATIC_DEV_MEM_CONSTANT are inconsistent. They should most probably be renamed to ALPAKA_STATIC_ACC_MEM_GLOBAL and ALPAKA_STATIC_ACC_MEM_CONSTANT

BenjaminW3 commented 6 years ago

I would propose the following:

Seperate the definition of the user visible from the implementation detail defines
Add the documentation from above
~Rename ALPAKA_STATIC_DEV_MEM_GLOBAL and ALPAKA_STATIC_DEV_MEM_CONSTANT~ (done)
Rename the implementation detail definitions to ALPAKA_FN_ACC_CUDA_ONLY -> ALPAKA_FN_ACC_CUDA, ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_ACC_CPU
Add a prefix to mark the internal implementation detail defines e.g. ALPAKA_INTERNAL_FN_HOST or ALPAKA_DETAIL_FN_HOST

tdd11235813 commented 6 years ago

Thanks for the clarification on your definitions. I will come back to the host-device scheme. However, I cannot follow at the moment, why an internal layer for the defines is necessary. For the definitions: A device/accelerator also can be a CPU, e.g., even when HIP(HCC) is used (so an _ACC_CPU could also be misleading). This concept host-* is different when we look at the compile stages, for which these macros are required at all, right? But the naming does not really reflect this, but mixes. Not easy to find an unambiguous scheme though. In #457 there already has been a discussion about the namings, where "_STAGE" and "_OFFLOADING_ONLY" have been proposed, but are still confusing and not exact.

My current thoughts are:

(Edit: do not read this, it is mixed up regarding ALPAKA_ACC_GPU_CUDA_ONLY_MODE and ALPAKA_FN_ACC_CUDA)

For backend specific kernels: ALPAKA_FN_ACC_CUDA or even ALPAKA_FN_CUDA
- uses case: when intrinsics or backend-specific API are used (e.g. __popc())
- would also imply existence of ALPAKA_FN_HIP, ALPAKA_FN_SYCL, ALPAKA_FN_TBB ?
- semantic: only compile when this backend is used, but we cannot disable functions by such kind of annotation, so ALPAKA_FN_ACC_CUDA and the like are somehow misleading
- can code be rewritten to waive macros like ALPAKA_FN_ACC_CUDA ?
- use ALPAKA_FN_DEVICE_ONLY and BOOST_LANG_CUDA ifdef switches for backend specific code?
- ALPAKA_FN_ACC_CUDA is relevant to handle ALPAKA_ACC_GPU_CUDA_ONLY_MODE, which is invented to use native CUDA code in kernels, which also can be solved by the aforementioned method, right?
- if not, then ALPAKA_FN_DEVICE_ONLY_MODE is proposed
For compile stage specific kernels: ALPAKA_FN, ALPAKA_FN_HOST_ONLY, ALPAKA_FN_DEVICE_ONLY
- I know it is still not obvious to someone, who is not familiar with CUDA compiler stages, but he probably only needs just the ALPAKA_FN case

Proposing:

#if BOOST_LANG_CUDA /* || <more languages which use two stage compiling> */
    #define ALPAKA_FN_DEVICE_ONLY __device__ 
    #if defined(ALPAKA_FN_DEVICE_ONLY_MODE) /* if it is needed at all */
        #define ALPAKA_FN __device__
    #else
        #define ALPAKA_FN __device__ __host__ 
    #endif
    #define ALPAKA_FN_HOST_DEVICE __device__ __host__
    #define ALPAKA_FN_HOST_ONLY __host__

BenjaminW3 commented 6 years ago

ALPAKA_FN_ACC_CUDA is relevant to handle ALPAKA_ACC_GPU_CUDA_ONLY_MODE

ALPAKA_FN_ACC_CUDA has nothing to do with ALPAKA_ACC_GPU_CUDA_ONLY_MODE. ALPAKA_FN_ACC_CUDA is only used for internal methods of the CUDA accelerator. When ALPAKA_ACC_GPU_CUDA_ONLY_MODE is set, the definition of ALPAKA_FN_ACC is changed so that user defined accelerator methods are __device__ only.

which is invented to use native CUDA code in kernels, which also can be solved by the aforementioned method, right?

Not only. It is also used to make alpaka accelerator methods be callable from native CUDA code.

tdd11235813 commented 6 years ago

k, thanks, but I am still not happy with this ;)

ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_ACC_CPU

is misleading to me, as actually it means: ALPAKA_FN_ACC_NO_CUDA (no device code) -> ALPAKA_FN_HOST_ONLY Think of HIP instead of CUDA, and where devices can be CPU. The naming mixes platforms, back-ends and compiler stages.

I still think, the proposed scheme above would work.

#if BOOST_LANG_CUDA /* || <more languages which use two stage compiling> */
    #define ALPAKA_FN_DEVICE_ONLY __device__
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) \
       || defined(ALPAKA_ACC_HIP_ONLY_MODE)
        #define ALPAKA_FN __device__
    #else
        #define ALPAKA_FN __device__ __host__ 
    #endif
    #define ALPAKA_FN_HOST_DEVICE __device__ __host__
    #define ALPAKA_FN_HOST_ONLY __host__

Would require:

ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_HOST_ONLY
ALPAKA_FN_ACC_CUDA_ONLY -> ALPAKA_FN_DEVICE_ONLY

BenjaminW3 commented 6 years ago

I am still not convinced. All 3 internal macros, ALPAKA_FN_ACC_CUDA_ONLY, ALPAKA_FN_ACC_NO_CUDA and ALPAKA_FN_HOST should not be used outside of alpaka. They are only semantic hints. Mapping ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_HOST_ONLY does not bring us anything. This would only require more knowledge about CUDA and remove the semantic hint. Those macros are meant to differentiate between an GPU accelerator implementation (ALPAKA_FN_ACC_CUDA_ONLY), a CPU accelerator implementation (ALPAKA_FN_ACC_NO_CUDA) and code that is only usable on the host and should not be called by any accelerator (ALPAKA_FN_HOST).

If you really want less such macros, we could simply do the following replacement, because there is nothing dynamic about those 3 macros:

ALPAKA_FN_ACC_CUDA_ONLY -> __device__
ALPAKA_FN_ACC_CUDA_ONLY -> ``
ALPAKA_FN_HOST -> ``

However, then we would lose the semantic hint.

Replacing ALPAKA_FN_ACC_CUDA_ONLY -> ALPAKA_FN_DEVICE_ONLY would again require more knowledge about CUDA from the reader which is not necessary. Then we could simply replace it by __device__.

tdd11235813 commented 6 years ago

Those macros are meant to differentiate between an GPU accelerator implementation (ALPAKA_FN_ACC_CUDA_ONLY), a CPU accelerator implementation (ALPAKA_FN_ACC_NO_CUDA) and code that is only usable on the host and should not be called by any accelerator (ALPAKA_FN_HOST).

ok, this defines the ACC in the ALPAKA_FN_ACC*.

I see that ALPAKA_FN_ACC_CUDA_ONLY -> ALPAKA_FN_DEVICE_ONLY can directly be replaced by __device__, if this is used in places, where CUDA is enabled anyways. I (as a CUDA programmer) actually would prefer __device__ then, so I directly see what's happening and I don't have to think about another alpaka macro, and someone might wonder, why there is no such thing for other languages. But I see the use of a semantic hint by ALPAKA_FN_ACC_CUDA_ONLY for those who are not familiar with __device__.

For the following internal macro:

Mapping ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_HOST_ONLY does not bring us anything

One of the reasons why I started this issue was that:

ALPAKA_FN_ACC_NO_CUDA 
  void function() {}

invited me to write:

ALPAKA_FN_ACC_NO_CUDA 
ALPAKA_FN_ACC_NO_HIP 
  void function() {}

It would work, if you only define one of the macros, but does not look nice. Thus, I wanted to get rid of the language-specific macro naming and focus on the two-staged compile process by only using ALPAKA_FN_HOST, although not everyone knows what that could mean.

However, ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_ACC_CPU is not right, because NO_CUDA does not imply CPU accelerator. It looks like we have to bite a bullet in each case. What's the issue if we would waive this macro for __host__, as in a two-stage compile process non-attributed functions are supposed to be __host__ anyway?

#if BOOST_LANG_CUDA
    #define ALPAKA_FN_ACC_CUDA_ONLY __device__
#endif
#if BOOST_LANG_HIP
    #define ALPAKA_FN_ACC_HIP_ONLY __device__
#endif
#if BOOST_LANG_CUDA || BOOST_LANG_HIP
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) \
     || defined(ALPAKA_ACC_GPU_HIP_ONLY_MODE)
        #define ALPAKA_FN_ACC __device__
    #else
        #define ALPAKA_FN_ACC __device__ __host__
    #endif
    #define ALPAKA_FN_HOST_ACC __device__ __host__

BenjaminW3 commented 6 years ago

After the latest changes we are down to:

    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)
        #define ALPAKA_FN_ACC __device__
    #else
        #define ALPAKA_FN_ACC __device__ __host__
    #endif
    #define ALPAKA_FN_HOST_ACC __device__ __host__
    #define ALPAKA_FN_HOST __host__

So we have a equivalence mapping between CUDA and alpaka. We could go one step further and remove ALPAKA_FN_HOST completely because it is the default. Any more ideas?

tdd11235813 commented 6 years ago

Normally __host__ is not necessary. Looking at kokkos they have used it in TaskExec, when the object is not in device code, while one of the constructor is attributed with __device__. Maybe it is possible, that a non-attributed class can mistakenly derive a __device__ attribute, where a __host__ for that class would have been necessary to disallow that.

BenjaminW3 commented 6 years ago

For now I would keep the ALPAKA_FN_HOST because it makes porting CUDA to alpaka easier (simple replacement). Do you see any more work necessary or can this ticket be closed for now?

tdd11235813 commented 6 years ago

ok, all fine with me, so closing this. Thanks again.

alpaka-group / alpaka

Less and not language specific ALPAKA_FN_* macros #585