CExA-project / ddc

DDC is a discrete domain computation library.
https://ddc.mdls.fr
Other
33 stars 5 forks source link

Runtime error encountered when using `discrete_space` #287

Closed tpadioleau closed 8 months ago

tpadioleau commented 9 months ago

See https://gitlab.maisondelasimulation.fr/gysela-developpers/gyselalibxx/-/merge_requests/379

tpadioleau commented 9 months ago

cc @yasahi-hpc

yasahi-hpc commented 9 months ago

Do you have a minimum reproducer?

tpadioleau commented 9 months ago

Unfortunately no, I was not able to reproduce it in a minimal example. Do you ?

yasahi-hpc commented 9 months ago

Not really. In the end, your reproducer worked without errors?

tpadioleau commented 8 months ago

No sorry i had made a typo in my previous comment.

yasahi-hpc commented 8 months ago

Sure. Obviously, mine is too big. May I have a look at your minimum reproducer?

tpadioleau commented 8 months ago
#include <ddc/ddc.hpp>

#include <gtest/gtest.h>

#include <Kokkos_Core.hpp>

namespace {

class SpeciesInformation
{
public:
    using discrete_dimension_type = SpeciesInformation;

    using discrete_element_type = ddc::DiscreteElement<SpeciesInformation>;

    using discrete_domain_type = ddc::DiscreteDomain<SpeciesInformation>;

    using discrete_vector_type = ddc::DiscreteVector<SpeciesInformation>;

public:
    template <class MemorySpace>
    class Impl
    {
        template <class OMemorySpace>
        friend class Impl;

    private:
        // charge of the particles (kinetic + adiabatic)
        ddc::Chunk<int, discrete_domain_type, ddc::KokkosAllocator<int, MemorySpace>> m_charge;

    public:
        using discrete_dimension_type = SpeciesInformation;

        template <class OMemorySpace>
        explicit Impl(Impl<OMemorySpace> const& impl) : m_charge(impl.m_charge.domain())
        {
            ddc::deepcopy(m_charge, impl.m_charge);
        }

        explicit Impl(ddc::Chunk<int, discrete_domain_type, ddc::KokkosAllocator<int, MemorySpace>>
                              charge)
            : m_charge(std::move(charge))
        {
        }

        auto charges() const
        {
            return m_charge.span_view();
        }
    };
};

} // namespace

void initialization()
{
    ddc::DiscreteDomain<SpeciesInformation> const
            dom(ddc::DiscreteElement<SpeciesInformation>(0),
                ddc::DiscreteVector<SpeciesInformation>(2));
    ddc::Chunk<int, ddc::DiscreteDomain<SpeciesInformation>> charges_host(dom);
    charges_host(ddc::DiscreteElement<SpeciesInformation>(0)) = 1;
    charges_host(ddc::DiscreteElement<SpeciesInformation>(1)) = 2;
    ddc::init_discrete_space<SpeciesInformation>(std::move(charges_host));
}

void test()
{
    ddc::ChunkSpan const charges_host = ddc::discrete_space<SpeciesInformation>().charges();
    auto charges_alloc = ddc::
            create_mirror_view_and_copy(Kokkos::DefaultExecutionSpace(), charges_host.span_view());
    auto const charges = charges_alloc.span_cview().allocation_kokkos_view();
    ddc::Chunk charges_copy_alloc(charges_host.domain(), ddc::DeviceAllocator<int>());
    auto const charges_copy = charges_copy_alloc.span_view().allocation_kokkos_view();
    Kokkos::parallel_for(
            charges_host.domain().size(),
            KOKKOS_LAMBDA(int isp) { charges_copy(isp) = 1 + charges(isp); });
    Kokkos::fence();
}

TEST(SpeciesInformation, Coordinate)
{
    initialization();
    test();
}
yasahi-hpc commented 8 months ago

Thanks. Looks to me a good minimum reproducer. The difference may be

  1. Usage of team policy
  2. charge is defined in a separate file

Maybe we need to start from my version to remove unnecessary parts.

tpadioleau commented 8 months ago

Could reproduce with the following

#include <ddc/ddc.hpp>

#include <gtest/gtest.h>

#include <Kokkos_Core.hpp>

namespace {

class SpeciesInformation
{
public:
    using discrete_dimension_type = SpeciesInformation;

public:
    template <class MemorySpace>
    class Impl
    {
        template <class OMemorySpace>
        friend class Impl;

    private:
        // charge of the particles (kinetic + adiabatic)
        ddc::Chunk<
                int,
                ddc::DiscreteDomain<SpeciesInformation>,
                ddc::KokkosAllocator<int, MemorySpace>>
                m_charge;

    public:
        using discrete_dimension_type = SpeciesInformation;

        template <class OMemorySpace>
        explicit Impl(Impl<OMemorySpace> const& impl) : m_charge(impl.m_charge.domain())
        {
            ddc::deepcopy(m_charge, impl.m_charge);
        }

        explicit Impl(ddc::Chunk<
                      int,
                      ddc::DiscreteDomain<SpeciesInformation>,
                      ddc::KokkosAllocator<int, MemorySpace>> charge)
            : m_charge(std::move(charge))
        {
        }

        auto charges() const
        {
            return m_charge.span_view();
        }
    };
};

} // namespace

// #define USE_FIX

int test()
{
#if defined(USE_FIX)
    auto const charges_host = ddc::host_discrete_space<SpeciesInformation>().charges();
#else
    auto const charges_host = ddc::discrete_space<SpeciesInformation>().charges();
#endif
    auto charges_alloc = ddc::
            create_mirror_view_and_copy(Kokkos::DefaultExecutionSpace(), charges_host.span_view());
    auto charges = charges_alloc.span_view().allocation_kokkos_view();
#if defined(KOKKOS_ENABLE_CUDA)
#if defined(USE_FIX)
    static_assert(std::is_same_v<
                  decltype(charges),
                  Kokkos::View<int*, Kokkos::LayoutRight, Kokkos::CudaSpace>>);
#else
    KOKKOS_IF_ON_HOST(static_assert(std::is_same_v<
                                    decltype(charges),
                                    Kokkos::View<int*, Kokkos::LayoutRight, Kokkos::CudaSpace>>);)
    KOKKOS_IF_ON_DEVICE(
            static_assert(std::is_same_v<
                          decltype(charges),
                          Kokkos::View<const int*, Kokkos::LayoutRight, Kokkos::CudaSpace>>);)
#endif
#endif

    int sum;
    Kokkos::parallel_reduce(
            charges_host.domain().size(),
            KOKKOS_LAMBDA(int isp, int& local_sum) { local_sum += charges(isp); },
            Kokkos::Sum<int>(sum));
    return sum;
}

TEST(SpeciesInformation, BugReproducer)
{
    ddc::DiscreteDomain<SpeciesInformation> const
            dom(ddc::DiscreteElement<SpeciesInformation>(0),
                ddc::DiscreteVector<SpeciesInformation>(2));
    ddc::Chunk<int, ddc::DiscreteDomain<SpeciesInformation>> charges_host(dom);
    charges_host(ddc::DiscreteElement<SpeciesInformation>(0)) = 1;
    charges_host(ddc::DiscreteElement<SpeciesInformation>(1)) = 2;
    ddc::init_discrete_space<SpeciesInformation>(std::move(charges_host));
    EXPECT_EQ(test(), 3);
}

@yasahi-hpc can you tell me if it also shows the error ?

unknown file: Failure
C++ exception with description "(CudaInternal::singleton().cuda_func_get_attributes_wrapper(&attr, func)) error( cudaErrorInvalidDeviceFunction): invalid device function /gpfs/workdir/tpadioleau/ddc/vendor/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:139" thrown in the test body.

My understanding is the difference of constness in the Kokkos::View captured in the lambda between the CPU and GPU compilation passes.

yasahi-hpc commented 8 months ago

I will give it a try next Monday.

My understanding is the difference of constness in the Kokkos::View captured in the lambda between the CPU and GPU compilation passes.

Do you mean the constness of value type in Kokkos::View? If that is the case, m_charge.span_view() can be const in CPU build and non-const in GPU build?

tpadioleau commented 8 months ago

I will give it a try next Monday.

My understanding is the difference of constness in the Kokkos::View captured in the lambda between the CPU and GPU compilation passes.

Do you mean the constness of value type in Kokkos::View? If that is the case, m_charge.span_view() can be const in CPU build and non-const in GPU build?

From charges_alloc yes, depending if it is a newly allocated Chunk or a ChunkSpan

yasahi-hpc commented 8 months ago

OK. So if we use ddc::create_mirror_view_and_copy on const ChunkSpan, there is always inconsistency between CPU and GPU builds?

If this is the case, the easiest solution may be changing the behavior of allocation_kokkos_view() method? I mean to allocate a View with a non-const data type. I am not sure, this is a desirable behavior or not.

tpadioleau commented 8 months ago

OK. So if we use ddc::create_mirror_view_and_copy on const ChunkSpan, there is always inconsistency between CPU and GPU builds?

No I would rather say the function ddc::create_mirror_view_and_copy is fine, we should just call it with the same arguments both on the host and device compilation phases.

If this is the case, the easiest solution may be changing the behavior of allocation_kokkos_view() method? I mean to allocate a View with a non-const data type. I am not sure, this is a desirable behavior or not.

I think we should advise against calling ddc::discrete_space in a host-only function. We could refine the logic of this rule but I think non-advanced users will not like it. Instead users should just use ddc::host_discrete_space inside a host-only function, as in the example above.

yasahi-hpc commented 8 months ago

OK. So if we use ddc::create_mirror_view_and_copy on const ChunkSpan, there is always inconsistency between CPU and GPU builds?

No I would rather say the function ddc::create_mirror_view_and_copy is fine, we should just call it with the same arguments both on the host and device compilation phases.

OK. That makes sense.

If this is the case, the easiest solution may be changing the behavior of allocation_kokkos_view() method? I mean to allocate a View with a non-const data type. I am not sure, this is a desirable behavior or not.

I think we should advise against calling ddc::discrete_space in a host-only function. We could refine the logic of this rule but I think non-advanced users will not like it. Instead users should just use ddc::host_discrete_space inside a host-only function, as in the example above.

For example, give a warning in case ddc::discrete_space is called in a host-only function for GPU build?

tpadioleau commented 8 months ago

For example, give a warning in case ddc::discrete_space is called in a host-only function for GPU build?

I don't think I can because the ddc::discrete_space function is allowed in a __host__ __device__ function.

An alternative rule of thumb is to forbid users to define getters that return an object of a type that depends on the MemorySpace, such as a ChunkSpan.

yasahi-hpc commented 8 months ago

For example, give a warning in case ddc::discrete_space is called in a host-only function for GPU build?

I don't think I can because the ddc::discrete_space function is allowed in a __host__ __device__ function.

OK.

An alternative rule of thumb is to forbid users to define getters that return an object of a type that depends on the MemorySpace, such as a ChunkSpan.

Forbid means static_assertions?

tpadioleau commented 8 months ago

An alternative rule of thumb is to forbid users to define getters that return an object of a type that depends on the MemorySpace, such as a ChunkSpan.

Forbid means static_assertions?

No just a good practice guideline, I cannot see a way to enforce that from the library.

tpadioleau commented 8 months ago

I suggest we close the issue

yasahi-hpc commented 8 months ago

Sure. I agree with it. Just keep this in mind

tpadioleau commented 8 months ago

Sure. I agree with it. Just keep this in mind

Right, in the doc of discrete_space maybe ?