alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:
https://alpaka.readthedocs.io
Mozilla Public License 2.0
356 stars 74 forks source link

Less and not language specific ALPAKA_FN_* macros #585

Closed tdd11235813 closed 6 years ago

tdd11235813 commented 6 years ago

Current ALPAKA_FN_* macros:

#if BOOST_LANG_CUDA                                
    #define ALPAKA_FN_ACC_CUDA_ONLY __device__     
    #define ALPAKA_FN_ACC_NO_CUDA __host__         
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)     
        #define ALPAKA_FN_ACC __device__           
    #else                                          
        #define ALPAKA_FN_ACC __device__ __host__  
    #endif                                         
    #define ALPAKA_FN_HOST_ACC __device__ __host__ 
    #define ALPAKA_FN_HOST __host__                

New version could be:

#if BOOST_LANG_CUDA /* || <more languages> */
    #define ALPAKA_FN_DEVICE __device__
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)
        #define ALPAKA_FN __device__
    #else
        #define ALPAKA_FN __device__ __host__ 
    #endif
    #define ALPAKA_FN_HOST_ACC __device__ __host__
    #define ALPAKA_FN_HOST __host__

This is minimal and also allows supporting other languages without involving new macros like ALPAKA_FN_ACC_NO_HIP. What do you think?

sbastrakov commented 6 years ago

I probably like 'acc' a little better than 'device', as generally it feels broader and used already in alpaka. However, I like having ALPAKA_FN as a kinda default qualifier for kernels.

Anyways, I think in your proposal it would be more consistent to rename ALPAKA_FN_HOST_ACC to ALPAKA_FN_HOST_DEVICE.

ax3l commented 6 years ago

We use acc/accelerator already for the backend namings in Alpaka. We could double-use it, not sure if it adds or removes confusion.

"Device" naming would be consistent with how we call "devices" in PIConGPU (as reference to GPUs and CPU-sockets).

Using ALPAKA_FN_DEVICE but ALPAKA_FN_HOST_ACC is inconsistent, though.

tdd11235813 commented 6 years ago

yes, the inconsistency is a mistake.

TL;DR: Here are the two options for the new layout, one with `DEVICE`: ```C #if BOOST_LANG_CUDA /* || */ #define ALPAKA_FN_DEVICE __device__ #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) #define ALPAKA_FN __device__ #else #define ALPAKA_FN __device__ __host__ #endif #define ALPAKA_FN_HOST_DEVICE __device__ __host__ #define ALPAKA_FN_HOST __host__ ``` ... and the other one with `ACC`: ```C #if BOOST_LANG_CUDA /* || */ #define ALPAKA_FN_ACC __device__ /* <- is confusing to previous version */ #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) #define ALPAKA_FN __device__ #else #define ALPAKA_FN __device__ __host__ #endif #define ALPAKA_FN_HOST_ACC __device__ __host__ #define ALPAKA_FN_HOST __host__ ``` As a CUDA programmer I am more familiar to the `host`-`device` naming scheme, instead of `host`-`acc`. On the other hand `acc` is the most used abbreviation in alpaka (exceptions are `ALPAKA_STATIC_DEV_MEM_GLOBAL` and `ALPAKA_STATIC_DEV_MEM_CONSTANT`). So, I would agree to the `host`-`acc` scheme, but take care of the change to the current version: - ALPAKA_FN_ACC_CUDA_ONLY -> ALPAKA_FN_ACC - ALPAKA_FN_ACC -> ALPAKA_FN This could be misleading and error-prone, when you still think in the previous naming scheme. So a third option could be to add `_ONLY`s.


I now would suggest this version:

#if BOOST_LANG_CUDA /* || <more languages> */
    #define ALPAKA_FN_ACC_ONLY __device__ 
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)
        #define ALPAKA_FN __device__
    #else
        #define ALPAKA_FN __device__ __host__ 
    #endif
    #define ALPAKA_FN_HOST_ACC __device__ __host__
    #define ALPAKA_FN_HOST_ONLY __host__
BenjaminW3 commented 6 years ago

#define ALPAKA_FN_ACC_ONLY __device__ does not seem to be correct. Which accelerator? All? Only CPU? Only CUDA? I already thought about removing ALPAKA_FN_ACC_CUDA_ONLY completely because it is only an implementation detail that should not be used by an end-user and has to be guarded by #if BOOST_LANG_CUDA. We could replace all of it's usages directly with __device__ but then it feels inconsistent to me again. But maybe better a bit-inconsistent than confusing.

BenjaminW3 commented 6 years ago

The same is true for ALPAKA_FN_ACC_NO_CUDA which you most probably replaced by ALPAKA_FN_HOST everywhere. This would be very confusing because host means the host CPU and ACC means an accelerator (which could be the CPU itself but must not be). ALPAKA_FN_ACC_NO_CUDA is used for methods that are used by CPU accelerators. ALPAKA_FN_HOST is used for methods that can not be used by an accelerator but only on the host CPU. Even though they resolve to the same (__host__) they are semantically different. We could directly use __host__ everywhere where those macros are used because they are again only implementation details.

BenjaminW3 commented 6 years ago
BenjaminW3 commented 6 years ago

Yes, I see that ALPAKA_STATIC_DEV_MEM_GLOBAL and ALPAKA_STATIC_DEV_MEM_CONSTANT are inconsistent. They should most probably be renamed to ALPAKA_STATIC_ACC_MEM_GLOBAL and ALPAKA_STATIC_ACC_MEM_CONSTANT

BenjaminW3 commented 6 years ago

I would propose the following:

tdd11235813 commented 6 years ago

Thanks for the clarification on your definitions. I will come back to the host-device scheme. However, I cannot follow at the moment, why an internal layer for the defines is necessary. For the definitions: A device/accelerator also can be a CPU, e.g., even when HIP(HCC) is used (so an _ACC_CPU could also be misleading). This concept host-* is different when we look at the compile stages, for which these macros are required at all, right? But the naming does not really reflect this, but mixes. Not easy to find an unambiguous scheme though. In #457 there already has been a discussion about the namings, where "_STAGE" and "_OFFLOADING_ONLY" have been proposed, but are still confusing and not exact.

My current thoughts are:

(Edit: do not read this, it is mixed up regarding ALPAKA_ACC_GPU_CUDA_ONLY_MODE and ALPAKA_FN_ACC_CUDA)

Proposing:

#if BOOST_LANG_CUDA /* || <more languages which use two stage compiling> */
    #define ALPAKA_FN_DEVICE_ONLY __device__ 
    #if defined(ALPAKA_FN_DEVICE_ONLY_MODE) /* if it is needed at all */
        #define ALPAKA_FN __device__
    #else
        #define ALPAKA_FN __device__ __host__ 
    #endif
    #define ALPAKA_FN_HOST_DEVICE __device__ __host__
    #define ALPAKA_FN_HOST_ONLY __host__
BenjaminW3 commented 6 years ago

ALPAKA_FN_ACC_CUDA is relevant to handle ALPAKA_ACC_GPU_CUDA_ONLY_MODE

ALPAKA_FN_ACC_CUDA has nothing to do with ALPAKA_ACC_GPU_CUDA_ONLY_MODE. ALPAKA_FN_ACC_CUDA is only used for internal methods of the CUDA accelerator. When ALPAKA_ACC_GPU_CUDA_ONLY_MODE is set, the definition of ALPAKA_FN_ACC is changed so that user defined accelerator methods are __device__ only.

which is invented to use native CUDA code in kernels, which also can be solved by the aforementioned method, right?

Not only. It is also used to make alpaka accelerator methods be callable from native CUDA code.

tdd11235813 commented 6 years ago

k, thanks, but I am still not happy with this ;)

ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_ACC_CPU

is misleading to me, as actually it means: ALPAKA_FN_ACC_NO_CUDA (no device code) -> ALPAKA_FN_HOST_ONLY Think of HIP instead of CUDA, and where devices can be CPU. The naming mixes platforms, back-ends and compiler stages.

I still think, the proposed scheme above would work.

#if BOOST_LANG_CUDA /* || <more languages which use two stage compiling> */
    #define ALPAKA_FN_DEVICE_ONLY __device__
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) \
       || defined(ALPAKA_ACC_HIP_ONLY_MODE)
        #define ALPAKA_FN __device__
    #else
        #define ALPAKA_FN __device__ __host__ 
    #endif
    #define ALPAKA_FN_HOST_DEVICE __device__ __host__
    #define ALPAKA_FN_HOST_ONLY __host__

Would require:

BenjaminW3 commented 6 years ago

I am still not convinced. All 3 internal macros, ALPAKA_FN_ACC_CUDA_ONLY, ALPAKA_FN_ACC_NO_CUDA and ALPAKA_FN_HOST should not be used outside of alpaka. They are only semantic hints. Mapping ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_HOST_ONLY does not bring us anything. This would only require more knowledge about CUDA and remove the semantic hint. Those macros are meant to differentiate between an GPU accelerator implementation (ALPAKA_FN_ACC_CUDA_ONLY), a CPU accelerator implementation (ALPAKA_FN_ACC_NO_CUDA) and code that is only usable on the host and should not be called by any accelerator (ALPAKA_FN_HOST).

If you really want less such macros, we could simply do the following replacement, because there is nothing dynamic about those 3 macros:

However, then we would lose the semantic hint.

Replacing ALPAKA_FN_ACC_CUDA_ONLY -> ALPAKA_FN_DEVICE_ONLY would again require more knowledge about CUDA from the reader which is not necessary. Then we could simply replace it by __device__.

tdd11235813 commented 6 years ago

Those macros are meant to differentiate between an GPU accelerator implementation (ALPAKA_FN_ACC_CUDA_ONLY), a CPU accelerator implementation (ALPAKA_FN_ACC_NO_CUDA) and code that is only usable on the host and should not be called by any accelerator (ALPAKA_FN_HOST).

ok, this defines the ACC in the ALPAKA_FN_ACC*.

I see that ALPAKA_FN_ACC_CUDA_ONLY -> ALPAKA_FN_DEVICE_ONLY can directly be replaced by __device__, if this is used in places, where CUDA is enabled anyways. I (as a CUDA programmer) actually would prefer __device__ then, so I directly see what's happening and I don't have to think about another alpaka macro, and someone might wonder, why there is no such thing for other languages. But I see the use of a semantic hint by ALPAKA_FN_ACC_CUDA_ONLY for those who are not familiar with __device__.

For the following internal macro:

Mapping ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_HOST_ONLY does not bring us anything

One of the reasons why I started this issue was that:

ALPAKA_FN_ACC_NO_CUDA 
  void function() {}

invited me to write:

ALPAKA_FN_ACC_NO_CUDA 
ALPAKA_FN_ACC_NO_HIP 
  void function() {}

It would work, if you only define one of the macros, but does not look nice. Thus, I wanted to get rid of the language-specific macro naming and focus on the two-staged compile process by only using ALPAKA_FN_HOST, although not everyone knows what that could mean.

However, ALPAKA_FN_ACC_NO_CUDA -> ALPAKA_FN_ACC_CPU is not right, because NO_CUDA does not imply CPU accelerator. It looks like we have to bite a bullet in each case. What's the issue if we would waive this macro for __host__, as in a two-stage compile process non-attributed functions are supposed to be __host__ anyway?

#if BOOST_LANG_CUDA
    #define ALPAKA_FN_ACC_CUDA_ONLY __device__
#endif
#if BOOST_LANG_HIP
    #define ALPAKA_FN_ACC_HIP_ONLY __device__
#endif
#if BOOST_LANG_CUDA || BOOST_LANG_HIP
    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE) \
     || defined(ALPAKA_ACC_GPU_HIP_ONLY_MODE)
        #define ALPAKA_FN_ACC __device__
    #else
        #define ALPAKA_FN_ACC __device__ __host__
    #endif
    #define ALPAKA_FN_HOST_ACC __device__ __host__ 
BenjaminW3 commented 6 years ago

After the latest changes we are down to:

    #if defined(ALPAKA_ACC_GPU_CUDA_ONLY_MODE)
        #define ALPAKA_FN_ACC __device__
    #else
        #define ALPAKA_FN_ACC __device__ __host__
    #endif
    #define ALPAKA_FN_HOST_ACC __device__ __host__
    #define ALPAKA_FN_HOST __host__

So we have a equivalence mapping between CUDA and alpaka. We could go one step further and remove ALPAKA_FN_HOST completely because it is the default. Any more ideas?

tdd11235813 commented 6 years ago

Normally __host__ is not necessary. Looking at kokkos they have used it in TaskExec, when the object is not in device code, while one of the constructor is attributed with __device__. Maybe it is possible, that a non-attributed class can mistakenly derive a __device__ attribute, where a __host__ for that class would have been necessary to disallow that.

BenjaminW3 commented 6 years ago

For now I would keep the ALPAKA_FN_HOST because it makes porting CUDA to alpaka easier (simple replacement). Do you see any more work necessary or can this ticket be closed for now?

tdd11235813 commented 6 years ago

ok, all fine with me, so closing this. Thanks again.