DynareJulia / FastLapackInterface.jl

MIT License
32 stars 8 forks source link

Add factorize!() method for BunchKaufman #38

Open MichelJuillard opened 7 months ago

MichelJuillard commented 7 months ago
RoyiAvital commented 7 months ago

Appreciate the support. Will it support using MKL.jl and Accelerate.jl?

MichelJuillard commented 7 months ago

I will test for MKL. I don't have the hardware to test for Accelerate. @RoyiAvital could you do that?

RoyiAvital commented 7 months ago

I will. Let me know when it is ready.

MichelJuillard commented 7 months ago

@RoyiAvital Sorry, I got confused with the actual syntax

  1. factorize!() returns a tuple with sytrf!() output, not the factorization returned by BunchKaufman(). It is therefor stillnecessary to call BunchKaufman() after factorize!()
  2. The shorter syntax factorize!(ws, Symmetrical(x)) allocates more than `factorize!(ws, 'U', x)
  3. A the end, for Bunch Kaufman, there is little difference between using factorize!() or the lower level direct call to LAPACK.sytr!()
  4. Here is a working example with both appoaches
    
    using FastLapackInterface
    using LinearAlgebra

function loop_1!(vXs, mCs, ws) for mC in mCs

factorization

    F1 = factorize!(ws, 'U', mC)
    F = BunchKaufman(mC, F1[2], 'U', true, false, BLAS.BlasInt(0))    
    # solving linear systems
    for vX in vXs
        ldiv!(F, vX)
    end 
end

end

function approach1( order, iterations)

create workspace

ws = Workspace(LAPACK.sytrf!, mCs[1])
mCs_1 = deepcopy(mCs)
vXs_1 = deepcopy(vXs)
loop_1!(vXs_1, mCs_1, ws)
mCs_1 = deepcopy(mCs)
vXs_1 = deepcopy(vXs)
@time loop_1!(vXs_1, mCs_1, ws)

end

function loop_2!(vXs, mCs, ws) for mC in mCs

factorization

    A, ipiv, info = LAPACK.sytrf!(ws, 'U', mC)        
    F = BunchKaufman(mC, ipiv, 'U', true, false, BLAS.BlasInt(0))
    # solving linear systems
    for vX in vXs
        ldiv!(F, vX)
    end 
end

end

function approach2(order, iterations)

create workspace

ws = BunchKaufmanWs(mCs[1])

mCs_1 = deepcopy(mCs)
vXs_1 = deepcopy(vXs)
loop_2!(vXs_1, mCs_1, ws)
mCs_1 = deepcopy(mCs)
vXs_1 = deepcopy(vXs)
@time loop_2!(vXs_1, mCs_1, ws)

end

order = 100 iterations = 10

mCs = [] vXs = [] for i = 1:iterations x = randn(order, order) mC = hermitianpart!(randn(n, n)).data push!(mCs, mC) push!(vXs, randn(order)) end

approach1(mCs, vXs) approach2(mCs, vXs)


5. It works with MKL (but seems slower than OpenBlas). Could you please try it with Accelerate?
RoyiAvital commented 7 months ago

Do these lines allocate?

F1 = factorize!(ws, 'U', mC)
F = BunchKaufman(mC, F1[2], 'U', true, false, BLAS.BlasInt(0))  
ldiv!(F, vX)

If not, this is perfect.

I will test on Accelerate.jl and report, no problem.

MichelJuillard commented 7 months ago

It still allocates for a reason that I don't understand but very little. It doesn't depend on the size of the matrix.

RoyiAvital commented 7 months ago

I assume F = BunchKaufman(mC, F1[2], 'U', true, false, BLAS.BlasInt(0)) is the allocating line, right?

MichelJuillard commented 7 months ago
F1 = factorize!(ws, 'U', mC)

allocates 64 bytes per iteration

F = BunchKaufman(mC, F1[2], 'U', true, false, BLAS.BlasInt(0))  

allocate 48 bytes per iteration