JuliaDynamics / ComplexityMeasures.jl

Estimators for probabilities, entropies, and other complexity measures derived from data in the context of nonlinear dynamics and complex systems
MIT License
54 stars 12 forks source link

Performance of `AAPE` is suboptimal #217

Open Datseris opened 1 year ago

Datseris commented 1 year ago

internal function AAPE has really bad performance, allocating 10 vectors for each element of a dataset... Thankfully it is easy to optimize it to do 0 allocations so I leave this here for a newcomer that would like to contribute.

nyedr commented 1 year ago

Hello, I'm interested in taking on this issue and optimizing the AAPE function to improve its performance. Before I claim the issue, I was wondering if you could tell me where the function is located in the directory. This will help me to determine if I have the necessary skills to optimize the function. Thank you in advance for your help.

kahaaga commented 1 year ago

Hello, I'm interested in taking on this issue and optimizing the AAPE function to improve its performance.

Hey, @EidanGar! We're happy to hear that!

Before I claim the issue, I was wondering if you could tell me where the function is located in the directory.

On the main branch, the AAPE function is located in src/probabilities_estimators/permutation_ordinal/SymbolicPermutation.jl. The source code is relatively simple and looks like this:

"""
    AAPE(x, A::Real = 0.5, m::Int = length(x))

Encode relative amplitude information of the elements of `a`.
- `A = 1` emphasizes only average values.
- `A = 0` emphasizes changes in amplitude values.
- `A = 0.5` equally emphasizes average values and changes in the amplitude values.
"""
function AAPE(x, A::Real = 0.5, m::Int = length(x))
    (A/m)*sum(abs.(x)) + (1-A)/(m-1)*sum(abs.(diff(x)))
end

You can assume x is an m-element tuple/vector. The precise values in the input x is not important. The goal would be to reduce allocations (and computation time, if possible). You can measure how much the function allocates, and how long it takes to run, by using the @btime macro from the BenchmarkTools.jl package

This will help me to determine if I have the necessary skills to optimize the function. Thank you in advance for your help.

Don't worry if you don't have a complete handle on the code/topic immediately. The function is small , so working on such an issue is a great opportunity to learn about code optimization in general. There are plenty of resources on code optimization for Julia, and a good place to start is in the official Julia language documentation.

If you need some help to get started, let us know. But for now, I leave no further hints, so you can have the pleasure of figuring it out yourself :D

EDIT: Just so you know, @EidanGar, we might change the folder structure a bit as part of our final code refactoring for V2, which is coming up really soon. However, the AAPE function will still be in the SymbolicPermutation.jl, possibly just moved one folder level down. This won't affect any work you do on the function itself.

Chronum94 commented 1 year ago

I just ran @btime on this with x::Vector{Float64} and I got 6 allocations.


julia> x = rand(4000);

julia> @btime AAPE($x)
  10.800 μs (6 allocations: 93.89 KiB)
0.4158251611947962```

Are datasets significantly different from Vectors that they cause 4 more allocations?
kahaaga commented 1 year ago

Are datasets significantly different from Vectors that they cause 4 more allocations?

@Chronum94 In the applications we have here, AAPE would typically be called with elements of Datasets, which are SVectors. That is, you'd do

x = Dataset(rand(100, 3))
AAPE.(x)
AAPE(x[2]) 

but not

x = Dataset(rand(100, 3))
AAPE(x)

In our applications (ordinal pattern encoding), AAPE will typically only operate on SVectors of length probably not much longer than 10-15.

So to answer you question: it is not the difference between Datasets and Vector that causes the allocations here, it's the specific implementation (using broadcasting etc).

Chronum94 commented 1 year ago

I tried the example given and I get an error which I've seen elsewhere, but don't quite know how to interpret:

julia> x = Dataset(rand(100, 3));

julia> AAPE.(x)
ERROR: MethodError: Cannot `convert` an object of type StaticArraysCore.SVector{3, Float64} to an object of type Float64
Closest candidates are:
  convert(::Type{T}, ::Base.TwicePrecision) where T<:Number at twiceprecision.jl:273
  convert(::Type{T}, ::AbstractChar) where T<:Number at char.jl:185
  convert(::Type{T}, ::CartesianIndex{1}) where T<:Number at multidimensional.jl:130
  ...
Stacktrace:
 [1] setindex!(A::Vector{Float64}, x::StaticArraysCore.SVector{3, Float64}, i1::Int64)
   @ Base .\array.jl:966
 [2] copyto!
   @ .\abstractarray.jl:904 [inlined]
 [3] _collect
   @ .\array.jl:718 [inlined]
 [4] collect
   @ .\array.jl:712 [inlined]
 [5] broadcastable
   @ .\broadcast.jl:704 [inlined]
 [6] broadcasted(::Function, ::Dataset{3, Float64})
   @ Base.Broadcast .\broadcast.jl:1296
 [7] top-level scope
   @ REPL[145]:1

Indexing into one of the rows, however, gets me:

 @btime AAPE3(x[1])
  67.143 ns (2 allocations: 48 bytes)
0.4063432882381257
kahaaga commented 1 year ago

Ah, sorry, my bad. I didn't test the code. Broadcasting like that isn't implemented Dataset yet. You'll have to do

import Entropies.AAPE # make sure you've checked out main branch of Entropies.jl
using StateSpaceSets

x = Dataset(rand(100, 3))
AAPE.(x.data) # or
[AAPE(pt) for pt in x]
Datseris commented 1 year ago

do vec(x) instead of x.data, as that's the public way to get the vector.

Chronum94 commented 1 year ago
julia> x = Dataset(rand(1000, 15));
julia> @btime AAPE.($vec(x));
  7.900 μs (3 allocations: 7.97 KiB)

Still not seeing it, maybe I'm missing something? This is on Julia 1.8.3.

Datseris commented 1 year ago

What's the question? Why you get allocations? the result of AAPE.($vec(x)) is a new vector containing the application of AAPE to each inner vector of x. Bechmark instead:

x = Dataset(rand(1000, 15))
y = x[1]
@btime AAPE($y)
Chronum94 commented 1 year ago

My bad, I was referring to the original issue of 10 allocations per element of Dataset, which I'm not seeing here. The above code snippet does in fact give 0 allocations. I was just looking at this to try and figure out where the suboptimal performance was (and maybe explore it).