Open mzaffalon opened 3 years ago
Yes, Cubature works much more in-place for vector-valued integrands than HCubature. It's not just a matter of providing an in-place integrand function, however — a lot of other parts of the code would need to be reworked to operate in-place.
@ChrisRackauckas, do you have any advice here from your experience with DifferentialEquations? .=
doesn't work for scalars or other immutables, so I'm not sure what's the best way to write code that is in-place for vectors but also works for scalars.
(A lot of realistic problems involve more expensive integrands, so that the overhead of the extra allocations matters less.)
I don't have good advice, but I can tell you how we do it in DifferentialEquations.jl. It's not really possible to handle in-place and out-of-place in a single set of functions, so in the internals of the time stepping we have both versions, one optimized for the static array and reverse-mode AD case (since reverse-mode is another case where the array-based operations allocating is helpful), while the other is a fully non-allocating mutating case. For example, look at the implementation of Dormand-Prince:
My idea here is that, in the library we might as well code the fastest thing we can until the compiler can handle it better, but it can't right now. There's too much if inplace, @. x = y else x = @. y
and all of those other edge cases. And you cannot determine it from type information. For example, with reverse-mode AD on Array you want to use out-of-place instead of in-place, so it needs to be a user choice. And in a higher order function, you need to change dx = f(x)
vs f!(dx,x)
(though in theory you could have an API of dx = f!(dx,x)
, but that sounds bug-prone IMO).
The next thing to address then is how to automatically detect this. What we do is look at the current method table:
https://github.com/SciML/DiffEqBase.jl/blob/master/src/utils.jl#L1-L46
and then use this to place a type-parameter in the problem construction:
https://github.com/SciML/DiffEqBase.jl/blob/master/src/problems/ode_problems.jl#L72
So then it's type-information after that point. That lets the user override it if necessary, and lets the user add it as a literal to make it type stable. While it feels a little hacky, in practice we've been doing this for 4 years and no one has really ever complained. The breaking scenario is that a user defines f(u,p,t)
, then f(du,u,p,t)
, and then wants the out-of-place behavior but is unwilling to write ODEProblem{false}(...)
. In theory that could happen, but it's not something that has shown up in user issues (because normally it's f
vs f!
, etc.) so it has generally worked with only very light documentation of the fact that it could be overrode.
One other semi-important detail is that we pair this with a separate broadcast implementation (@..
) which forces @simd ivdep
for the internal uses where we know the arrays do not alias, and this allows all of the internal operations to be broadcasting and thus support GPUs as well without losing optimality on CPUs.
That's probably not as elegant of an answer as you were hoping for, but I think it's the only real answer to get the most optimal dispatches for the two cases until there's better lifetime management and array freezing in the compiler.
I understand only the first part of your answer and I can try to write a PR to do in-place mutation.
The basic code that would have to be duplicated is the rule evaluation, e.g. here.
We could use a type parameter in the rule construction to say whether we want an in-place rule or not.
I don't think we need to look at the method table. Probably we could just have an interface hcubature!
where the user passes a pre-allocated array to store the integrals, and in this interface we could assume they also passed an in-place function f!(result, x)
, and then pass a parameter to the low-level _hcubature
routine to construct this version of the rule
.
Yes, I agree that in this use case it's overkill to do the methods table reading. (and probably in DiffEq, but by now the interface is set in stone).
If I understand correctly, there should be a function (g::GenzMalik{n,T})(result::AbstractVector{T}, f, a::AbstractVector{T}, b::AbstractVector{T}, norm=norm) where {T}
that duplicates the current code for in-place mutation.
Yes, probably by adding a type parameter to GenzMalik
indicating whether it is in place.
I am integrating
L(f) = \int K.(f .- g(x)) dx
where f is a vector of 100-1000 elements,g
andK
are scalar, the integration is in R^3. The dimension of the codomain ofK
is the same asf
because it is easy to vectorize the computation ofK
.HCubature
does worse thanCubature
, the memory allocation being 100 times that of Cubature (SVector
s do not perform well for large vectors, although it may not have anything to do withSVector
) and 20 times slower. I am not sure if part of the problem is also that Cubature allows one to copy in place the result of the computation.Here is a MWE with the timings on my machine. Is there a workaround or will passing an integrand with an extra argument that accepts an in-place return value be considered?
EDIT: timing with
@benchmark
.