JuliaDynamics / ChaosTools.jl

Tools for the exploration of chaos and nonlinear dynamics
https://juliadynamics.github.io/DynamicalSystemsDocs.jl/chaostools/stable/
MIT License
187 stars 35 forks source link

faster `linreg` implementation #243

Closed Datseris closed 2 years ago

Datseris commented 2 years ago

See https://github.com/JuliaDynamics/TimeseriesSurrogates.jl/pull/99

kahaaga commented 2 years ago

@Datseris The type annotations are probably too strict, so the tests fail for, for example, SubArrays

Try this:

function linreg(x, y)
    (N = length(x)) == length(y) || throw(DimensionMismatch())
    ldiv!(cholesky!(Symmetric([float(N) sum(x); 0.0 sum(abs2, x)], :U)), [sum(y), dot(x, y)])
end
kahaaga commented 2 years ago

Just to verify that results are the same:

using LinearAlgebra, BenchmarkTools
using Statistics
using Statistics: covm, varm
# The following function comes from a version in StatsBase that is now deleted
# StatsBase is copyrighted under the MIT License with
# Copyright (c) 2012-2016: Dahua Lin, Simon Byrne, Andreas Noack, Douglas Bates,
# John Myles White, Simon Kornblith, and other contributors.
"""
    linreg(x, y) -> a, b
Perform a linear regression to find the best coefficients so that the curve:
`z = a + b*x` has the least squared error with `y`.
"""
function linreg(x::AbstractVector, y::AbstractVector)
    # Least squares given
    # Y = a + b*X
    # where
    # b = cov(X, Y)/var(X)
    # a = mean(Y) - b*mean(X)
    if size(x) != size(y)
        throw(DimensionMismatch("x has size $(size(x)) and y has size $(size(y)), " *
            "but these must be the same size"))
    end
    mx = Statistics.mean(x)
    my = Statistics.mean(y)
    # don't need to worry about the scaling (n vs n - 1)
    # since they cancel in the ratio
    b = covm(x, mx, y, my)/varm(x, mx)
    a = my - b*mx
    return a, b
end

function linreg_nonstrict(x, y)
    (N = length(x)) == length(y) || throw(DimensionMismatch())
    ldiv!(cholesky!(Symmetric([float(N) sum(x); 0.0 sum(abs2, x)], :U)), [sum(y), dot(x, y)])
end

n = 10000; trend = 1.0:1.0:float(n) |> collect; x = rand(n) .* 5 .+ trend; t = rand(1.0:1.0:1000, n)

linreg(t, x), linreg_nonstrict(t, x)
julia> @btime linreg($t, $x)
  16.917 μs (4 allocations: 156.34 KiB)
(4935.5219551373875, 0.1340685698968203)
julia> @btime linreg_nonstrict($t, $x)
  10.375 μs (8 allocations: 320 bytes)
2-element Vector{Float64}:
 4935.52195513737
    0.13406856989685584
Datseris commented 2 years ago

thank you

Datseris commented 2 years ago

So test fail now for another reason: https://github.com/JuliaDynamics/ChaosTools.jl/runs/4854175092?check_suite_focus=true#step:7:281

  PosDefException: matrix is not positive definite; Cholesky factorization failed.

Given that the speed of this operation isn't really that critical for the library I'll leave this open until I have a looooot of free time on my hands to fix this.

Datseris commented 2 years ago

(not worth spending time to fix, this speed improvement is simply negligible for where this function is used in the library)