Open dannys4 opened 2 years ago
Also check out the relatively new StridedMatrix
and StridedVector
aliases in ArrayConversions.h
. You'd probably want to expose StridedMatrix<double,Kokkos::HostSpace>
and StridedMatrix<double, Kokkos::DefaultExecutionSpace::memory_space>
as well as the StridedVector
versions of this. Note that Kokkos::HostSpace
and Kokkos::DefaultExecutionSpace::memory_space
will be the same if Kokkos was not compiled with device support (e.g., without either cuda or sycl).
The ToDevice
and ToHost
functions in ArrayConversions.h
might also be useful in the implementation of this.
@dannys4 Were you imagining exposing two version of each class: one that evaluates on Host and one that evaluates on Device? That would allow users to have a control over where the evaluation occurs, which might be nice since the CPU evaluation will likely be faster for small batch sizes because it avoids the host->device->host copies.
Also check out the relatively new StridedMatrix and StridedVector aliases in ArrayConversions.h
I saw that PR and it makes sense to use those. I'll have to dig deeper once I get the chance.
The ToDevice and ToHost functions in ArrayConversions.h might also be useful in the implementation of this.
This was my thought as well!
Were you imagining exposing two version of each class?
I think we're on the same page here, but "each class" is a little ambiguous-- I was thinking of just wrapping everything that was templated with something like MemorySpace
in the host space and (if the Kokkos_ENABLE_CUDA
option is ON
) the device space (if not ON
, then just throw an error when you call toDevice
). I'm trying to minimize data movement so that it's not just continually and pointlessly copying between CPU and GPU. However, we should only have to wrap those things once in some templated function like ConditionalMapBaseWrapper<MemorySpace>()
, then we can just call that method twice I would hope. Maybe this illuminates your comments/questions?
After some group discussions last week, we are going to push the Julia and Matlab bindings into the post-joss milestone.
As discussed in #149, we need to add bindings such that we can do GPU-powered maps in the bound languages. I'm thinking the way we do this is to make a function like
Wrap<MemorySpace>(arr)
or something, for each of the bindings, that takes in an host-language array (e.g. numpy, matlab, julia array) and spits out a wrappedKokkos::View
. Then, we can throw that into whatever functions we want (e.g.ConditionalMapBase
, etc) without needing to change the base library. Then, we could template the binding functions (e.g.ConditionalMapBaseWrapper()
for python) based on memory space, so we only have to write one set of bindings (hopefully!) for both memory spaces.If someone puts in a raw host-language array, then I suppose we just assume that they mean to use
Kokkos::HostSpace
.