create shared memory without user provided id

alpaka-group / alpaka

Abstraction Library for Parallel Kernel Acceleration :llama:

https://alpaka.readthedocs.io

Mozilla Public License 2.0

353 stars 72 forks source link

create shared memory without user provided id #2255

Open psychocoderHPC opened 5 months ago

psychocoderHPC commented 5 months ago

Currently we need a unique id to create in kernel static shared memory:

Type& var = declareSharedVar<Type, __COUNTER__>(acc);

With c++20 we could auto generate this id:

#include <iostream>
#include <iterator>
#include <memory>

template<typename = decltype([]() {})>
    struct UniqueId {
        static constexpr auto singleton = [] {
        };
        static constexpr const decltype(singleton) *address = std::addressof(singleton);
        static constexpr const decltype(singleton) *origin = nullptr;
        static constexpr size_t id = std::distance(origin,address);
    };

template<size_t T_id = UniqueId<>::id>
struct SharedMem
{
    static constexpr size_t id = T_id;

};    

int main()
{

std::cout<<UniqueId<>::id<<std::endl;
std::cout<<UniqueId<>::id<<std::endl;
std::cout<<UniqueId<float>::id<<std::endl;
std::cout<<UniqueId<float>::id<<std::endl;
}

test it live: https://godbolt.org/z/MrhbvncET

will be

This based on the id generator I saw in this talk https://www.youtube.com/watch?v=lPfA4SFojao

fwyzard commented 5 months ago

Interesting.

Is it safer than the current approach? How does it work when e.g. two header files, each with a shared memory definition, are included in different orders ?

psychocoderHPC commented 5 months ago

This approach is at least compatible too the current interface. This you could still be explicit with your ID.

How does it work when e.g. two header files, each with a shared memory definition, are included in different orders?

This would be something that must be tested. I do not know the optimizations for device linked code.

psychocoderHPC commented 5 months ago

I created ID's in a shared library within a function foo and in the main CPU. Linked it together and the IDs are unique. In general it should be because each lambda [](){} should get a anique type.

psychocoderHPC commented 5 months ago

Is it safer than the current approach?

The user does not need to fiddle around with the counter makro and the interface will become nice because there is no need to explicitly set the ID.

fwyzard commented 5 months ago

The user does not need to fiddle around with the counter makro and the interface will become nice because there is no need to explicitly set the ID.

Yes, I completely agree with this.

My concern is a case where

header file A defines a first device function that declares a shared memory block
header file B defines a second device function that declares a shared memory block
header file C includes A and B and defines a device kernel that uses those functions
file D includes B and C

Now, D includes B before indirectly including A, so the __COUNTER__ have different values in C and D, which can cause ODR violations or other problems with the shared memory declarations.

Does this approach based on unique lambda addresses make things more robust ?

file D includes B and A (in the opposite order), then C, then defines the kernel