CasBex commented 1 year ago

Hi, I was profiling my code and noticed that most of the time was spent inside FMIImport.jl, so I decided to make some performance improvements. Functionality-wise there is no change. The example I was testing this on went from about 60s with 22 GB allocations to 18s with 2.8 GB (which is still not amazing in all honesty).

This is only for FMI2 as I don't use FMI3 myself. Some of these changes require updates to FMICore.jl which is incoming. Before merging, FMICore.jl shoud thus be updated and the Project.toml in this repo should be updated for compatibility.

Summary

fmi2SetReal checking of the Jacobians in c.jl has been improved (biggest performance gain by far). By making a Set of $\partial$f_refs, it is possible to do lookups in O(1) instead of O(n), which reduces the for loop runtime complexity.
Various allocations and copies have been removed if possible
Type stability has been improved where possible
Added methods to keyword-only performance-critical functions (fmu::FMU2) and (c::FMU2Component) which seem to be faster.

Performance tips from a user perspective

The FMI C-functions (e.g. cFmi2SetReal...) operate on Vector{T} and return this type, regardless of which type of array was provided (where T depends on the C-function). If other types ::AbstractVector{<:Real} are provided, they are converted upon every C-call which takes a lot of time and memory allocations. @ThummeTo Maybe it would be a good idea to dispatch functions instead on Vector{T} and have a single catch-all upper method dispatch on AbstractVector{<:Real} and provide a performance warning for every function?

For optimal performance it is best to preallocate a v::Vector{T} of the correct size and type and reuse this as an intermediate buffer between whatever function you have running and the FMU. (I was personally passing views of arrays to avoid allocation, but this turned out to be counterproductive because of above)

Further improvements

The remaining allocations and slowness are mostly caused by type instability/ambiguity. This happens for example in the FMU2Component which has fields A,B,C,D,E,F::Union{Nothing,FMUJacobian} This means that in every function, Julia has to check at runtime what the type is, even though we know that functions like fmi2SetReal are only called when the Jacobians are already initialised. This causes significant slowdown. This would however require more significant changes to the library than I am personally willing to make (and would be too big for one PR).

CasBex commented 1 year ago

Specifically the changes in c5d5387 and a895080 require this PR in FMICore.jl

CasBex commented 1 year ago

71fd194 to avoid mutating c.x when x is mutated. This will avoid some bugs for users who use often-changing cache-vectors for this.