carstenbauer / StableDQMC.jl

Numerical stabilization routines for determinant quantum Monte Carlo
https://carstenbauer.github.io/StableDQMC.jl/dev/
MIT License
29 stars 5 forks source link

Faster non-allocating UDT decomposition #16

Open ffreyer opened 4 years ago

ffreyer commented 4 years ago

I've re-implemented the QR decompositions from Julia base in MonteCarlo.jl with the goal of making them non-allocating. I've also made use of LoopVectorization where I could (and RecursiveFactorization to reduce allocations of the matrix inversions in calculate_greens/udt_inv_one_plus).

Here are some benchmarks (via TimerOutputs in julia 1.4.1) for an L=6 attractive Hubbard model on a square lattice (beta = U = t = 1.0, mu=0):

Using StableDQMC's udt! and udt_inv_one_plus:

───────────────────────────────────────────────────────────────────────────────────────
                                                Time                   Allocations      
                                       ──────────────────────   ───────────────────────
           Tot / % measured:                5.38s / 98.5%           1.55GiB / 100%     

Section                        ncalls     time   %tot     avg     alloc   %tot      avg
───────────────────────────────────────────────────────────────────────────────────────
run!                                1    5.30s   100%   5.30s   1.55GiB  100%   1.55GiB
  propagate                     40.0k    4.36s  82.3%   109μs   1.55GiB  100%   40.5KiB
    propagate                   4.00k    2.06s  38.9%   515μs   1.14GiB  73.7%   300KiB
      calculate_greens          4.00k    1.98s  37.3%   494μs   0.98GiB  63.4%   258KiB

    add_slice_sequence_right    2.00k    946ms  17.8%   473μs    206MiB  13.0%   106KiB
      UDT                       2.00k    787ms  14.8%   393μs    185MiB  11.6%  94.6KiB

    add_slice_sequence_left     2.00k    923ms  17.4%   462μs    206MiB  13.0%   105KiB
      UDT                       2.00k    787ms  14.8%   393μs    185MiB  11.6%  94.6KiB
───────────────────────────────────────────────────────────────────────────────────────

Using udt_AVX_pivot! and calculate_greens_AVX! from MonbteCarlo:

 ───────────────────────────────────────────────────────────────────────────────────────
                                                Time                   Allocations      
                                       ──────────────────────   ───────────────────────
           Tot / % measured:                2.15s / 96.9%            356MiB / 99.4%  

Section                        ncalls     time   %tot     avg     alloc   %tot      avg
───────────────────────────────────────────────────────────────────────────────────────
run!                                1    2.08s   100%   2.08s    354MiB  100%    354MiB
  propagate                     40.0k    1.34s  64.4%  33.5μs    349MiB  98.5%  8.93KiB
    propagate                   4.00k    541ms  26.0%   135μs    306MiB  86.4%  78.3KiB
      calculate_greens          4.00k    445ms  21.4%   111μs    143MiB  40.3%  36.5KiB

    add_slice_sequence_right    2.00k    206ms  9.91%   103μs   21.6MiB  6.10%  11.1KiB
      UDT                       2.00k   64.1ms  3.08%  32.1μs    224KiB  0.06%     115B

    add_slice_sequence_left     2.00k    197ms  9.47%  98.7μs   21.2MiB  5.99%  10.9KiB
      UDT                       2.00k   66.9ms  3.21%  33.4μs    224KiB  0.06%     115B
───────────────────────────────────────────────────────────────────────────────────────

I believe the allocations under UDT don't actually exist. Running julia with track-allocations=user shows 0 allocations in the relevant code. The allocations in calculate_greens are caused by LinearAlgebra.inv!.