Open nsajko opened 3 months ago
The function I'm looking for from MutableArithmetics is @rewrite
, which I didn't think worked for arbitrary functions from the discussion here: https://github.com/jump-dev/MutableArithmetics.jl/issues/253
From a developer in that discussion:
Does MutableArithmetics not work for general functions, only for the standard arithmetic operations (+,-,/,*)?
Correct the
@rewrite
macro works only for a limited subset of the standard arithmetic operations.
I usually avoid @rewrite
anyway, because it's too high-level. For optimal performance it's necessary to use operate!!
etc. directly.
I usually avoid
@rewrite
anyway, because it's too high-level. For optimal performance it's necessary to useoperate!!
etc. directly.
I see. Yes the thought was to have a macro users could prepend to expressions containing the mutable struct, so that a pre-allocated temporary buffer could be used instead for the temporaries generated, while keeping the syntax simple and transparent to other types. The current @FastGTPSA
macro I implemented does this, albeit with some weird tricks.
At least at the time, I couldn't see any major benefits to satisfying the MutableArithmetics interface other than for @rewrite
, and had difficulty understanding how to do so. What other benefits would satisfying the interface offer?
Looking at the @FastGTPSA
docs you linked (thanks!), it seems like it relies on a global TPS vector for avoiding allocations? I guess that makes thread-safety impossible or at least expensive, which seems like a pretty big drawback?
Another thing that seems like a drawback of the @FastGTPSA
approach is that, as far as I understand, it can only work for hardcoded TPS types, because a global TPS vector is required for each TPS type supported by @FastGTPSA
, right?
At least at the time, I couldn't see any major benefits to satisfying the MutableArithmetics interface other than for
@rewrite
, and had difficulty understanding how to do so. What other benefits would satisfying the interface offer?
It'd allow users to, e.g., move allocations outside of loops or outright eliminate them in a generic and thread-safe manner, but at the cost of uglier and less convenient code.
Take this loop as an example:
for x ∈ vec
s += f(x)
end
Assuming f
needs to allocate its return value, and +
perhaps also allocates, it might make sense to implement operate!
for +
, and operate_to!
for f
. Then these allocations could be eliminated in user code like so:
y = ... # preallocate
for x ∈ vec
y = MA.operate_to!!(y, f, x)
s = MA.operate!!(+, s, y)
end
Furthermore, if f
allocates more than just its return value, it'd make sense to implement buffer_for
and buffered_operate_to!
, which would be used like this to move allocations from f
to outside the loop:
y = ... # preallocate
buf = MA.buffer_for(f, eltype(vec))
for x ∈ vec
y = MA.buffered_operate_to!!(buf, y, f, x)
s = MA.operate!!(+, s, y)
end
Other MA functions to implement would be mutability
and mutable_copy
.
The purpose of using the !!
functions instead of !
functions is making the code more generic: the !!
functions are correct even when the MA API is not implemented.
Looking at the
@FastGTPSA
docs you linked (thanks!), it seems like it relies on a global TPS vector for avoiding allocations? I guess that makes thread-safety impossible or at least expensive, which seems like a pretty big drawback?
You are correct, with this approach internally, the macro in its current state is definitely not thread safe. The current plan is to switch to a thread-safe memory pool of TPSs which the temporaries can draw from. This would make the macro thread safe, but is not implemented yet.
Another thing that seems like a drawback of the
@FastGTPSA
approach is that, as far as I understand, it can only work for hardcoded TPS types, because a global TPS vector is required for each TPS type supported by@FastGTPSA
, right?
Kind of. Right now there are only 2 types of TPSs (TPS
and ComplexTPS
, both representing Float64
s), both of which are supported by @FastGTPSA
and have corresponding pre-allocated buffers. The macro basically changes the expression to use operators/functions that do special things when encountering a TPS
or ComplexTPS
, or default to the regular operator/function if any other type is encountered.
I do feel that in the long term, MutableArithmetics would be a good interface to implement here. However I would really like to see some kind of equivalent @FastGTPSA
in MA that draws from a pre-allocated, thread-safe buffer of the mutable structs for expression evaluation. Take a look at some of the expressions in benchmark/track.jl. By simply prepending the macro to the expressions, the code stays very simple to read, is type-generic, and still gains all of the benefits of reusing temporaries. I fear rewriting these complicated expressions using operate!!
and operate_to!!
would make the code much less readable, and thus harder to maintain/debug.
A thread-safe memory pool will have its costs, but it seems like a reasonable solution. Implementing the MA interfaces would enable users to squeeze out every last ounce of performance, when necessary, using (hopefully) familiar APIs like MA.operate!!
.
I guess it's true that @rewrite
would be more useful if it supported more than just basic arithmetic, however I probably won't be tackling that issue as I'm not good with macros, and in fact I try to stay away from them.
For now, I think I will wait until MA has better documentation and an equivalent @FastGTPSA
/better @rewrite
to implement the interface. Many months ago when I was making this decision, I read the README for MA and the paper it refers to there and still had a lot trouble understanding how to properly implement the interface. For this reason I fear MA.operate!!
is not very familiar to outside users. I also feel having to rewrite simple expressions using operate!!
would only make the code less-readable, and over-complicate things that are presently simple using @FastGTPSA
(especially once the thread-safe pool is implemented).
I also am not the best with writing macros, @FastGTPSA
was a major pain...
@nsajko can you point me to any documentation, or describe how to actually implement MutableArithmetics
with steps? In fact if you have the time and are familiar with it, a PR to MutableArithmetics updating the documentation with such a guide with be valuable
Does this example MA implementation help: kalmarek/Arblib.jl#178?
Does this example MA implementation help: kalmarek/Arblib.jl#178?
Thanks!
I thought about this a lot, but decided it seems best to stick with and improve my current implementation:
I have made @FastGTPSA
thread safe along with some significant improvements (including no more weird symbols exported, and it can act on a block of code). E.g.
julia> d = Descriptor(3, 7); x = vars(d);
julia> y = rand(3);
julia> @btime @FastGTPSA begin # 2 allocations per TPS, no temporaries generated:
t1 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
t2 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
z = $y[1]^3*sin($y[2])/log(2+$y[3])-exp($y[1]*$y[2])*im;
end;
11.251 μs (4 allocations: 3.88 KiB)
julia> t3 = ComplexTPS64(); t4 = ComplexTPS64(); @gensym w; # pre-allocated TPSs and some scalar number
julia> @btime @FastGTPSA! begin # zero allocations
$t3 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
$t4 = $x[1]^3*sin($x[2])/log(2+$x[3])-exp($x[1]*$x[2])*im;
$w = $y[1]^3*sin($y[2])/log(2+$y[3])-exp($y[1]*$y[2])*im;
end;
10.930 μs (0 allocations: 0 bytes)
This basically does everything I could do with MA (and more actually, since @FastGTPSA
works for any general function and not just the arithmetic operators), so at this point in time I don't see much benefit to satisfying the interface anymore.
On that note, I think my solution for @FastGTPSA
might be of interest to others, since it supports all functions/operators, and by taking advantage of MD, it really is transparent to all non-TPS
types while keeping the syntax completely simple
Originally posted by @mattsignorelli in https://github.com/bmad-sim/GTPSA.jl/issues/46#issuecomment-1869816519
It's not true that MA "only works for simple arithmetic", MA works for arbitrary functions. If you have questions I'd be happy to help.