Closed Datseris closed 2 years ago
@Datseris The type annotations are probably too strict, so the tests fail for, for example, SubArray
s
Try this:
function linreg(x, y)
(N = length(x)) == length(y) || throw(DimensionMismatch())
ldiv!(cholesky!(Symmetric([float(N) sum(x); 0.0 sum(abs2, x)], :U)), [sum(y), dot(x, y)])
end
Just to verify that results are the same:
using LinearAlgebra, BenchmarkTools
using Statistics
using Statistics: covm, varm
# The following function comes from a version in StatsBase that is now deleted
# StatsBase is copyrighted under the MIT License with
# Copyright (c) 2012-2016: Dahua Lin, Simon Byrne, Andreas Noack, Douglas Bates,
# John Myles White, Simon Kornblith, and other contributors.
"""
linreg(x, y) -> a, b
Perform a linear regression to find the best coefficients so that the curve:
`z = a + b*x` has the least squared error with `y`.
"""
function linreg(x::AbstractVector, y::AbstractVector)
# Least squares given
# Y = a + b*X
# where
# b = cov(X, Y)/var(X)
# a = mean(Y) - b*mean(X)
if size(x) != size(y)
throw(DimensionMismatch("x has size $(size(x)) and y has size $(size(y)), " *
"but these must be the same size"))
end
mx = Statistics.mean(x)
my = Statistics.mean(y)
# don't need to worry about the scaling (n vs n - 1)
# since they cancel in the ratio
b = covm(x, mx, y, my)/varm(x, mx)
a = my - b*mx
return a, b
end
function linreg_nonstrict(x, y)
(N = length(x)) == length(y) || throw(DimensionMismatch())
ldiv!(cholesky!(Symmetric([float(N) sum(x); 0.0 sum(abs2, x)], :U)), [sum(y), dot(x, y)])
end
n = 10000; trend = 1.0:1.0:float(n) |> collect; x = rand(n) .* 5 .+ trend; t = rand(1.0:1.0:1000, n)
linreg(t, x), linreg_nonstrict(t, x)
julia> @btime linreg($t, $x)
16.917 μs (4 allocations: 156.34 KiB)
(4935.5219551373875, 0.1340685698968203)
julia> @btime linreg_nonstrict($t, $x)
10.375 μs (8 allocations: 320 bytes)
2-element Vector{Float64}:
4935.52195513737
0.13406856989685584
thank you
So test fail now for another reason: https://github.com/JuliaDynamics/ChaosTools.jl/runs/4854175092?check_suite_focus=true#step:7:281
PosDefException: matrix is not positive definite; Cholesky factorization failed.
Given that the speed of this operation isn't really that critical for the library I'll leave this open until I have a looooot of free time on my hands to fix this.
(not worth spending time to fix, this speed improvement is simply negligible for where this function is used in the library)
See https://github.com/JuliaDynamics/TimeseriesSurrogates.jl/pull/99