Closed lxvm closed 1 year ago
Patch coverage: 99.31%
and project coverage change: +0.28%
:tada:
Comparison is base (
cb745d4
) 97.87% compared to head (d560001
) 98.16%.
:umbrella: View full report in Codecov by Sentry.
:loudspeaker: Have feedback on the report? Share it here.
I think we just want to provide the caller with the array of points to evaluate and let them parallelize it (or not) using threads, the GPU, distributed-memory, or whatever.
One way to expose this API would be to define as a wrapper around the integrand, e.g.:
struct BatchIntegrand{F,X,Y}
f!::F # in-place function f!(y, x) that takes an array of x values and outputs an array of results in-place
max_batch::Int # maximum number of x to supply in parallel (defaults to typemax(Int))
x::Vector{X}
y::Vector{Y}
end
BatchIntegrand(f!::F, ::Type{X}, ::Type{Y}; max_parallel::Integer=typemax(Int)) where {F,X,Y} =
BatchIntegrand{F,X,Y}(f!, max_parallel, X[], Y[])
BatchIntegrand(f!::F, x::Vector{X}, y::Vector{Y}; max_parallel::Integer=typemax(Int)) where {F,X,Y} =
BatchIntegrand{F,X,Y}(f!, max_parallel, x, y)
It's a little annoying that it doesn't get X
from the type of the endpoints. One possibility would to have a constructor:
BatchIntegrand(f!::F, ::Type{Y}; max_parallel::Integer=typemax(Int)) where {F,Y} =
BatchIntegrand{F,None,Y}(f!, max_parallel, nothing[], Y[])
and then if the quadgk
function sees a BatchIntegrand{F,None,Y}
then it can replace it with BatchIntegrand{F,X,Y}
where it gets X
from the endpoint type.
I wrote a draft implementation of BatchIntegrand
that can be tested with the following example integrand
julia> using QuadGK, Unitful
julia> f(x) = exp(-(x/(1.0u"m"))^2)
f (generic function with 1 method)
julia> quadgk(f, 0.0u"m", Inf*u"m")
(0.8862269254527579 m, 8.595134596701424e-9 m)
julia> f!(y, x) = y .= f.(x)
f! (generic function with 1 method)
julia> quadgk(QuadGK.BatchIntegrand(f!, Float64, typeof(0.0u"m")), 0.0u"m", Inf*u"m")
(0.8862269254527579 m, 8.595134596694648e-9 m)
Introducing BatchIntegrand
also changes the default behavior of quadgk
when using relative tolerances, since it can simultaneously adapt all of the segments in the heap that surpass the "current" tolerance. Previously, the "current" tolerance was updated after any segment was refined, however now I changed the default behavior to match BatchIntegrand
. I expect the computational effort to be the same if the caller performs singularity separation for their integrand. Otherwise, there are integrands with multiple peaks in an interval that previously only integrated one dominating peak and now can capture more or all of the peaks. For example:
julia> f(x) = sin(x)/(cos(x)+im*1e-5) # peaked "nice" integrand
f (generic function with 1 method)
julia> imf(x) = imag(f(x)) # peaked difficult integrand
imf (generic function with 1 method)
julia> imf2(x) = imf(x)^2 # even more peaked
imf2 (generic function with 1 method)
julia> x0 = 0.1 # arbitrary offset of between peaks
0.1
julia> using QuadGK
julia> quadgk(x -> imf2(x) + imf2(x-x0), 0, 2pi, rtol = 1e-5) # previous versions
(235619.45750214785, 0.8317222055900457)
julia> quadgk(x -> imf2(x) + imf2(x-x0), 0, 2pi, rtol = 1e-5) # with this pr
(628318.5307008666, 3.1850136117936505)
The implementation of BatchIntegrand
is as a stateful iterator yielding segments because the state is in the buffer of integrand evaluations that may contain points corresponding to either less than or more than one segment. The heap also acts as a stateful iterator that pops off segments one at a time until the error goes below the tolerance. In order to reuse as much existing code as I could, which isn't much, I try to hide the fact that these iterators are stateful by collecting their results into tuples. I try doing so with recursion, however I believe this leads to a lot of code generation and there is still a lot of type instability within the iterators that I attribute to the variables I use for iterator state. I would appreciate any ideas for ways of mitigating type instability and thus allocations, as well as iterating only once over stateful iterators to avoid storing them in tuples.
I simplified the implementation of BatchIntegrand
by requiring that max_batch >= (4*order+2)
and updating the integral estimate after each batch of GK rules is evaluated. So now, the batch integrator refines as many segments as need to be refined for the current tolerance up to the limit of how many fit into the batch buffer at once. Since this can affect numerical results obtained with a relative tolerance, I added a docstring example on the effect of max_batch
.
When refactoring the original adaptation code, I also had to change the return type of adapt
to handle type-unstable integrands, so I also put the final re-summation of the integral into its own routine.
If more simplifications are possible, then I can attempt implementing them. Otherwise, it might be possible to reduce allocations in the batching code, but some may be unavoidable due to resize!
for the buffers or some reshuffling of elements in the heap (which could probably be done better).
This is looking good! Thanks for working on this!
Note that we will also want to add a new section to the manual on this, giving some explanation/motivation and a short example.
Thanks! I'll start writing a page in the manual and will follow up on the review in another commit
Examples were added to the manual and the constructors were modified as previously discussed. I also added BatchIntegrand
to the exported API. Thank you for reviewing this PR!
Perhaps the only thing this PR leaves to be desired is a batched and vector-valued integrand combining both BatchIntegrand
and InplaceIntegrand
. It would be great if it could be done by adding additional dimensions to the output array y
in f!(y,x)
(e.g. with https://github.com/JuliaArrays/ElasticArrays.jl), but it would make sense to do it in a future PR.
I was able to implement a batched and inplace integrand via adding additional dimensions to the y
buffer. This mostly reuses the existing code by defining a method for quadgk!(::BatchIntegrand,...)
and internally creating a BatchIntegrand(InplaceIntegrand(...))
. I started this on a branch https://github.com/lxvm/QuadGK.jl/tree/batchinplace, but perhaps it would be best to have it as a package extension, since it would really only work with ElasticArrays.jl
Upon reflection, I think it would be more useful to make max_batch
act like max_evals
— it should be a rough limit on the batch size, but not a hard limit. That way, the user doesn't have to worry about accidentally making max_batch
slightly too small.
(The main purpose of max_batch
is to not allow the buffer allocation to be unbounded, and for this purpose I don't think the user cares if you allocate a slightly bigger buffer. Whereas having quadgk
throw an exception if you accidentally forget to change max_batch
when you change the order or the minimum number of segments is a huge pain for the user.)
I agree that making max_batch
behave as a soft limit is convenient. We can implement this by always evaluating the minimum number of segments needed for any call which can be done by changing the inner loop in refine
from
while len > nsegs && 2m*(nsegs+1) <= f.max_batch && E > tol && numevals < maxevals
...
end
to
while len > nsegs && 2m*nsegs < f.max_batch && E > tol && numevals < maxevals
...
end
and removing the errors for small values of max_batch
@stevengj I rebased this pr on master and I think it is ready if it also looks good to you
@stevengj Do you think this is finished?
@stevengj I updated the docs and made the changes you suggested, so it should be ready to go.
There is also a straightforward path for extending this PR to inplace and batched integrands. If the y
buffer is a VectorOfSimilarArrays
of an ElasticArray
with a last dimension for batching, then the behavior of BatchIntegrand
is identical, except with mutable integrand values, i.e. array views. We could make this work with a quadgk!(f!::BatchIntegrand, result, a, b)
method that internally creates a BatchIntegrand(InplaceIntegrand(f!,...),...)
, then add some if-statement branches to make sure in-place operations are used on the integrand values.
Thanks for keeping at it!
This is a draft pr to parallelize
quadgk
andquadgk!
over evaluation points. Using a procedure similar to the one discussed by Gladwell in "Vectorisation of One-Dimensional Quadrature Codes" (https://doi.org/10.1007/978-94-009-3889-2_24), I remove all the segments in the heap that will need to be refined and then I evaluate the integrand at all the Kronrod points in all of the new segments in parallel. I decided to go for a simpleThreads.@threads
approach since I wasn't sure if using something more sophisticated such asFloops.jl
would provide much more benefit. Here is an example showing some speed up on a simulated costly integrandTODO:
parevalrule
forInplaceIntegrand