MichelJuillard commented 7 months ago

factorize() method is missing for BunchKaufmann (see https://discourse.julialang.org/t/ann-fastlapackinterface-jl-v1-0-0-non-allocating-lapack-factorizations/83354/10?u=micheljuillard and https://discourse.julialang.org/t/ann-fastlapackinterface-jl-v1-0-0-non-allocating-lapack-factorizations/83354/10?u=micheljuillard)
check also the other methods

RoyiAvital commented 7 months ago

Appreciate the support. Will it support using MKL.jl and Accelerate.jl?

MichelJuillard commented 7 months ago

I will test for MKL. I don't have the hardware to test for Accelerate. @RoyiAvital could you do that?

RoyiAvital commented 7 months ago

I will. Let me know when it is ready.

MichelJuillard commented 7 months ago

@RoyiAvital Sorry, I got confused with the actual syntax

factorize!() returns a tuple with sytrf!() output, not the factorization returned by BunchKaufman(). It is therefor stillnecessary to call BunchKaufman() after factorize!()
The shorter syntax factorize!(ws, Symmetrical(x)) allocates more than `factorize!(ws, 'U', x)
A the end, for Bunch Kaufman, there is little difference between using factorize!() or the lower level direct call to LAPACK.sytr!()

Here is a working example with both appoaches


using FastLapackInterface
using LinearAlgebra

function loop_1!(vXs, mCs, ws) for mC in mCs

factorization

    F1 = factorize!(ws, 'U', mC)
    F = BunchKaufman(mC, F1[2], 'U', true, false, BLAS.BlasInt(0))    
    # solving linear systems
    for vX in vXs
        ldiv!(F, vX)
    end 
end

end

function approach1( order, iterations)

create workspace

ws = Workspace(LAPACK.sytrf!, mCs[1])
mCs_1 = deepcopy(mCs)
vXs_1 = deepcopy(vXs)
loop_1!(vXs_1, mCs_1, ws)
mCs_1 = deepcopy(mCs)
vXs_1 = deepcopy(vXs)
@time loop_1!(vXs_1, mCs_1, ws)

end

function loop_2!(vXs, mCs, ws) for mC in mCs

factorization

    A, ipiv, info = LAPACK.sytrf!(ws, 'U', mC)        
    F = BunchKaufman(mC, ipiv, 'U', true, false, BLAS.BlasInt(0))
    # solving linear systems
    for vX in vXs
        ldiv!(F, vX)
    end 
end

end

function approach2(order, iterations)

create workspace

ws = BunchKaufmanWs(mCs[1])

mCs_1 = deepcopy(mCs)
vXs_1 = deepcopy(vXs)
loop_2!(vXs_1, mCs_1, ws)
mCs_1 = deepcopy(mCs)
vXs_1 = deepcopy(vXs)
@time loop_2!(vXs_1, mCs_1, ws)

end

order = 100 iterations = 10

mCs = [] vXs = [] for i = 1:iterations x = randn(order, order) mC = hermitianpart!(randn(n, n)).data push!(mCs, mC) push!(vXs, randn(order)) end

approach1(mCs, vXs) approach2(mCs, vXs)


5. It works with MKL (but seems slower than OpenBlas). Could you please try it with Accelerate?

RoyiAvital commented 7 months ago

Do these lines allocate?

F1 = factorize!(ws, 'U', mC)
F = BunchKaufman(mC, F1[2], 'U', true, false, BLAS.BlasInt(0))  
ldiv!(F, vX)

If not, this is perfect.

I will test on Accelerate.jl and report, no problem.

MichelJuillard commented 7 months ago

It still allocates for a reason that I don't understand but very little. It doesn't depend on the size of the matrix.

RoyiAvital commented 7 months ago

I assume F = BunchKaufman(mC, F1[2], 'U', true, false, BLAS.BlasInt(0)) is the allocating line, right?

MichelJuillard commented 7 months ago

F1 = factorize!(ws, 'U', mC)

allocates 64 bytes per iteration

F = BunchKaufman(mC, F1[2], 'U', true, false, BLAS.BlasInt(0))

allocate 48 bytes per iteration

DynareJulia / FastLapackInterface.jl

Add factorize!() method for BunchKaufman #38

factorization

create workspace

factorization

create workspace