Closed tpadioleau closed 8 months ago
cc @yasahi-hpc
Do you have a minimum reproducer?
Unfortunately no, I was not able to reproduce it in a minimal example. Do you ?
Not really. In the end, your reproducer worked without errors?
No sorry i had made a typo in my previous comment.
Sure. Obviously, mine is too big. May I have a look at your minimum reproducer?
#include <ddc/ddc.hpp>
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
namespace {
class SpeciesInformation
{
public:
using discrete_dimension_type = SpeciesInformation;
using discrete_element_type = ddc::DiscreteElement<SpeciesInformation>;
using discrete_domain_type = ddc::DiscreteDomain<SpeciesInformation>;
using discrete_vector_type = ddc::DiscreteVector<SpeciesInformation>;
public:
template <class MemorySpace>
class Impl
{
template <class OMemorySpace>
friend class Impl;
private:
// charge of the particles (kinetic + adiabatic)
ddc::Chunk<int, discrete_domain_type, ddc::KokkosAllocator<int, MemorySpace>> m_charge;
public:
using discrete_dimension_type = SpeciesInformation;
template <class OMemorySpace>
explicit Impl(Impl<OMemorySpace> const& impl) : m_charge(impl.m_charge.domain())
{
ddc::deepcopy(m_charge, impl.m_charge);
}
explicit Impl(ddc::Chunk<int, discrete_domain_type, ddc::KokkosAllocator<int, MemorySpace>>
charge)
: m_charge(std::move(charge))
{
}
auto charges() const
{
return m_charge.span_view();
}
};
};
} // namespace
void initialization()
{
ddc::DiscreteDomain<SpeciesInformation> const
dom(ddc::DiscreteElement<SpeciesInformation>(0),
ddc::DiscreteVector<SpeciesInformation>(2));
ddc::Chunk<int, ddc::DiscreteDomain<SpeciesInformation>> charges_host(dom);
charges_host(ddc::DiscreteElement<SpeciesInformation>(0)) = 1;
charges_host(ddc::DiscreteElement<SpeciesInformation>(1)) = 2;
ddc::init_discrete_space<SpeciesInformation>(std::move(charges_host));
}
void test()
{
ddc::ChunkSpan const charges_host = ddc::discrete_space<SpeciesInformation>().charges();
auto charges_alloc = ddc::
create_mirror_view_and_copy(Kokkos::DefaultExecutionSpace(), charges_host.span_view());
auto const charges = charges_alloc.span_cview().allocation_kokkos_view();
ddc::Chunk charges_copy_alloc(charges_host.domain(), ddc::DeviceAllocator<int>());
auto const charges_copy = charges_copy_alloc.span_view().allocation_kokkos_view();
Kokkos::parallel_for(
charges_host.domain().size(),
KOKKOS_LAMBDA(int isp) { charges_copy(isp) = 1 + charges(isp); });
Kokkos::fence();
}
TEST(SpeciesInformation, Coordinate)
{
initialization();
test();
}
Thanks. Looks to me a good minimum reproducer. The difference may be
Maybe we need to start from my version to remove unnecessary parts.
Could reproduce with the following
#include <ddc/ddc.hpp>
#include <gtest/gtest.h>
#include <Kokkos_Core.hpp>
namespace {
class SpeciesInformation
{
public:
using discrete_dimension_type = SpeciesInformation;
public:
template <class MemorySpace>
class Impl
{
template <class OMemorySpace>
friend class Impl;
private:
// charge of the particles (kinetic + adiabatic)
ddc::Chunk<
int,
ddc::DiscreteDomain<SpeciesInformation>,
ddc::KokkosAllocator<int, MemorySpace>>
m_charge;
public:
using discrete_dimension_type = SpeciesInformation;
template <class OMemorySpace>
explicit Impl(Impl<OMemorySpace> const& impl) : m_charge(impl.m_charge.domain())
{
ddc::deepcopy(m_charge, impl.m_charge);
}
explicit Impl(ddc::Chunk<
int,
ddc::DiscreteDomain<SpeciesInformation>,
ddc::KokkosAllocator<int, MemorySpace>> charge)
: m_charge(std::move(charge))
{
}
auto charges() const
{
return m_charge.span_view();
}
};
};
} // namespace
// #define USE_FIX
int test()
{
#if defined(USE_FIX)
auto const charges_host = ddc::host_discrete_space<SpeciesInformation>().charges();
#else
auto const charges_host = ddc::discrete_space<SpeciesInformation>().charges();
#endif
auto charges_alloc = ddc::
create_mirror_view_and_copy(Kokkos::DefaultExecutionSpace(), charges_host.span_view());
auto charges = charges_alloc.span_view().allocation_kokkos_view();
#if defined(KOKKOS_ENABLE_CUDA)
#if defined(USE_FIX)
static_assert(std::is_same_v<
decltype(charges),
Kokkos::View<int*, Kokkos::LayoutRight, Kokkos::CudaSpace>>);
#else
KOKKOS_IF_ON_HOST(static_assert(std::is_same_v<
decltype(charges),
Kokkos::View<int*, Kokkos::LayoutRight, Kokkos::CudaSpace>>);)
KOKKOS_IF_ON_DEVICE(
static_assert(std::is_same_v<
decltype(charges),
Kokkos::View<const int*, Kokkos::LayoutRight, Kokkos::CudaSpace>>);)
#endif
#endif
int sum;
Kokkos::parallel_reduce(
charges_host.domain().size(),
KOKKOS_LAMBDA(int isp, int& local_sum) { local_sum += charges(isp); },
Kokkos::Sum<int>(sum));
return sum;
}
TEST(SpeciesInformation, BugReproducer)
{
ddc::DiscreteDomain<SpeciesInformation> const
dom(ddc::DiscreteElement<SpeciesInformation>(0),
ddc::DiscreteVector<SpeciesInformation>(2));
ddc::Chunk<int, ddc::DiscreteDomain<SpeciesInformation>> charges_host(dom);
charges_host(ddc::DiscreteElement<SpeciesInformation>(0)) = 1;
charges_host(ddc::DiscreteElement<SpeciesInformation>(1)) = 2;
ddc::init_discrete_space<SpeciesInformation>(std::move(charges_host));
EXPECT_EQ(test(), 3);
}
@yasahi-hpc can you tell me if it also shows the error ?
unknown file: Failure
C++ exception with description "(CudaInternal::singleton().cuda_func_get_attributes_wrapper(&attr, func)) error( cudaErrorInvalidDeviceFunction): invalid device function /gpfs/workdir/tpadioleau/ddc/vendor/kokkos/core/src/Cuda/Kokkos_Cuda_KernelLaunch.hpp:139" thrown in the test body.
My understanding is the difference of constness in the Kokkos::View
captured in the lambda between the CPU and GPU compilation passes.
I will give it a try next Monday.
My understanding is the difference of constness in the
Kokkos::View
captured in the lambda between the CPU and GPU compilation passes.
Do you mean the constness of value type in Kokkos::View
?
If that is the case, m_charge.span_view()
can be const in CPU build and non-const in GPU build?
I will give it a try next Monday.
My understanding is the difference of constness in the
Kokkos::View
captured in the lambda between the CPU and GPU compilation passes.Do you mean the constness of value type in
Kokkos::View
? If that is the case,m_charge.span_view()
can be const in CPU build and non-const in GPU build?
From charges_alloc
yes, depending if it is a newly allocated Chunk or a ChunkSpan
OK.
So if we use ddc::create_mirror_view_and_copy
on const ChunkSpan
, there is always inconsistency between CPU and GPU builds?
If this is the case, the easiest solution may be changing the behavior of allocation_kokkos_view()
method?
I mean to allocate a View with a non-const data type.
I am not sure, this is a desirable behavior or not.
OK. So if we use
ddc::create_mirror_view_and_copy
onconst ChunkSpan
, there is always inconsistency between CPU and GPU builds?
No I would rather say the function ddc::create_mirror_view_and_copy
is fine, we should just call it with the same arguments both on the host and device compilation phases.
If this is the case, the easiest solution may be changing the behavior of
allocation_kokkos_view()
method? I mean to allocate a View with a non-const data type. I am not sure, this is a desirable behavior or not.
I think we should advise against calling ddc::discrete_space
in a host-only function. We could refine the logic of this rule but I think non-advanced users will not like it. Instead users should just use ddc::host_discrete_space
inside a host-only function, as in the example above.
OK. So if we use
ddc::create_mirror_view_and_copy
onconst ChunkSpan
, there is always inconsistency between CPU and GPU builds?No I would rather say the function
ddc::create_mirror_view_and_copy
is fine, we should just call it with the same arguments both on the host and device compilation phases.
OK. That makes sense.
If this is the case, the easiest solution may be changing the behavior of
allocation_kokkos_view()
method? I mean to allocate a View with a non-const data type. I am not sure, this is a desirable behavior or not.I think we should advise against calling
ddc::discrete_space
in a host-only function. We could refine the logic of this rule but I think non-advanced users will not like it. Instead users should just useddc::host_discrete_space
inside a host-only function, as in the example above.
For example, give a warning in case ddc::discrete_space
is called in a host-only function for GPU build?
For example, give a warning in case
ddc::discrete_space
is called in a host-only function for GPU build?
I don't think I can because the ddc::discrete_space
function is allowed in a __host__ __device__
function.
An alternative rule of thumb is to forbid users to define getters that return an object of a type that depends on the MemorySpace
, such as a ChunkSpan
.
For example, give a warning in case
ddc::discrete_space
is called in a host-only function for GPU build?I don't think I can because the
ddc::discrete_space
function is allowed in a__host__ __device__
function.
OK.
An alternative rule of thumb is to forbid users to define getters that return an object of a type that depends on the
MemorySpace
, such as aChunkSpan
.
Forbid means static_assertions?
An alternative rule of thumb is to forbid users to define getters that return an object of a type that depends on the
MemorySpace
, such as aChunkSpan
.Forbid means static_assertions?
No just a good practice guideline, I cannot see a way to enforce that from the library.
I suggest we close the issue
Sure. I agree with it. Just keep this in mind
Sure. I agree with it. Just keep this in mind
Right, in the doc of discrete_space
maybe ?
See https://gitlab.maisondelasimulation.fr/gysela-developpers/gyselalibxx/-/merge_requests/379