As discussed in the QED meeting, we need an interface for event generation that produces a target number of accepted events. For this, some form of executor (interface) is necessary to enable the use of different forms of event generation and weight calculation. This interface generate_events should enable at least the following types of kernels (I use "kernel" for the function doing the actual work to be less ambiguous):
A simple CPU single-threaded or dynamically multi-threaded kernel working on a single CPU-allocated `PhaseSpacePoint. The implementation should itself broadcast using all available threads (dynamically), maybe with a keyword argument to limit the number of used threads.
A single-threaded kernel working on a generically allocated PhaseSpacePoint (i.e. it might be run on a GPU or elsewhere that Julia supports compilation for, with the typical more strict constraints of type stability and such) which can be broadcasted over, regardless of the Vector type (e.g., Vector{PSP}, CuVector{PSP}, ROCVector{PSP}). This type of kernel would be expected to work with the given (GPU-)Vector type, i.e. I don't think it would be necessary to have automatic checks here. If it fails, it fails.
A GPU kernel function working on an input AbstractVector{PSP} and producing an output mask of accepted events. This is a bit more difficult because there might be necessary arguments to the kernel call (such as always_inline or specific threads/groupsize and blocks/gridsize values). So perhaps it would be better to take a wrapper function call that only contains the kernel call. This wrapper could also take the specific device to execute on to allow multiple devices (GPUs) to run in parallel.
A kernel that we make no assumptions for. This means it may use any and all hardware available on the system and might take a single PSP or multiple PSPs as a Tuple, Vararg, Vector, or anything else. For example, it might be better for hardware load reasons to compute multiple events of a chunk in parallel, but inside the kernel. I think this ultimately just requires a custom overload of the generate_events interface entirely.
Note that I assumed we get PhaseSpacePoints as input, but a value (like a random seed) to first create a PSP from in an event generator would also work. This is just about what types of kernels could come up and should be able to be integrated into the interface.
It might even make sense to separate this entirely from the event generation itself and make it an interface for heterogeneous computing in general (maybe in https://github.com/oschulz/HeterogeneousComputing.jl ?)
For reference, the HEPExample project currently uses this description of the interface, which would be the simple implementation of what we need:
"""
generate_events(E_in::T, nevents; array_type::Type{ARRAY_TYPE}=Vector{T}, chunksize=100)
where {T<:Real, ARRAY_TYPE<:AbstractVector{T}}
Generate a specified number of unweighted events using rejection sampling. Events are generated in chunks,
and only accepted events are retained.
# Arguments
- `E_in::T`: Energy of the incoming electron (must be a subtype of `Real`).
- `nevents`: The number of unweighted events to generate.
- `array_type::Type{ARRAY_TYPE}`: Optional; the type of array to use for the internal event generation (default is `Vector{T}`).
- `chunksize`: Optional; the number of events to generate per chunk (default is 100).
# Returns
- A list of unweighted `Event` objects.
"""
As discussed in the QED meeting, we need an interface for event generation that produces a target number of accepted events. For this, some form of executor (interface) is necessary to enable the use of different forms of event generation and weight calculation. This interface
generate_events
should enable at least the following types of kernels (I use "kernel" for the function doing the actual work to be less ambiguous):PhaseSpacePoint
(i.e. it might be run on a GPU or elsewhere that Julia supports compilation for, with the typical more strict constraints of type stability and such) which can be broadcasted over, regardless of the Vector type (e.g.,Vector{PSP}
,CuVector{PSP}
,ROCVector{PSP}
). This type of kernel would be expected to work with the given (GPU-)Vector type, i.e. I don't think it would be necessary to have automatic checks here. If it fails, it fails.AbstractVector{PSP}
and producing an output mask of accepted events. This is a bit more difficult because there might be necessary arguments to the kernel call (such asalways_inline
or specificthreads
/groupsize
andblocks
/gridsize
values). So perhaps it would be better to take a wrapper function call that only contains the kernel call. This wrapper could also take the specific device to execute on to allow multiple devices (GPUs) to run in parallel.PSP
or multiplePSP
s as a Tuple, Vararg, Vector, or anything else. For example, it might be better for hardware load reasons to compute multiple events of a chunk in parallel, but inside the kernel. I think this ultimately just requires a custom overload of thegenerate_events
interface entirely.Note that I assumed we get
PhaseSpacePoint
s as input, but a value (like a random seed) to first create a PSP from in an event generator would also work. This is just about what types of kernels could come up and should be able to be integrated into the interface. It might even make sense to separate this entirely from the event generation itself and make it an interface for heterogeneous computing in general (maybe in https://github.com/oschulz/HeterogeneousComputing.jl ?)For reference, the HEPExample project currently uses this description of the interface, which would be the simple implementation of what we need: