Add wrapper type for checkerboard decompositions

ffreyer commented 1 year ago

TODO:

[x] Introduce checkerboard wrapper type
[x] move functionality to wrapper
- [x] slice matrix multiplications
- [x] greens matrix
[x] add tests
- [x] for lattice decomposition of different lattices (Square, Triangular, Honeycomb maybe?)
- [x] for simulations of different lattices
[x] optimize
[x] cleanup code (discard inaccurate version, cleanup comments, remove unnecessary code, ...)
[x] generalize checkerboard decomposition to odd system sizes (and fix issues that may arise...)
[x] figure out why L = 6 square has propagation errors
[ ] figure out why L = 7 square has propagation errors

closes #160

ffreyer commented 1 year ago

I removed the slice_matrix functions and moved that code into multiply_slice_matrix_.... This is sort of required for checkerboard multiplications, since we can do in-place multiplications to a matrix there. Without checkerboard this ended up removing a copy, which should be a minor performance improvement.

ffreyer commented 1 year ago

checkerboard currently starts outperforming around L=13 on a square lattice (169 sites).

Annoyingly the more common left multiplication is still 2-3x slower than right multiplication because the index arrays are part of the fats loop.

ffreyer commented 1 year ago

Used $A B = (B^T A^T)^T$ so left multiplications could use right multiplication of checkerboard matrices. With that L = 10 is now the breakeven point (100 sites)

ffreyer commented 1 year ago

I reworked the checkerboard decomposition to be symmetric, i.e. include src -> trg and trg -> src bonds in the same matrix multiplication. After a bunch of debugging this now reproduces the hopping matrices and greens functions from the normal case exactly. Performance dropped a little bit, now L = 11 is probably the break-even point. The decomposition only works with even sizes though, that's still TODO. Also tests.

ffreyer commented 1 year ago

About the CI error that's occurring:

The error seems to be a float precision cancellation error in the UDT transform. (It's a bit weird that I get this on multiple platforms now, when I only got it on ios before, but that's where I'm at now.) It can be reproduced with

M = [-2.084463722368626e-9 1.1199524840239728e-9 3.1373309041162437e-9 -1.270952849741743e-9; 2.3138504481595235e-9 -1.2431986842791233e-9 -3.4825813664270424e-9 1.4108160240001075e-9; 3.73337913231268e-9 -2.005891101950022e-9 -5.619117091314847e-9 2.276340334667651e-9; -1.4240800573451987e-9 7.651378052578579e-10 2.143386006626151e-9 -8.682994036886407e-10]

as the input matrix. In this case the first iteration of the loop producing R (from QR) produces two zeros in R[2:3, 4] due to lack of precision which propagates to a zero on the Diagonal, producing 1 / 0 = NaN/Inf. This is not an issue with reflectorApply. The summation there is loses very little precision. (i.e. we sum 10^-9 and 10^-10 values) qr from base also does not perform better here, though it does not produce zeros. (compared to bigfloat Base.qr is further off than 0 at R[end, end].)

The issue comes from using a matrix where each column is a multiple of another M[:, i] = c * M[:, j]. A 0 on the diagonal of the R matrix is technically correct in this case

ffreyer commented 1 year ago

For a 7x7 square lattice the checkerboard decomposition looks like this:

Screenshot from 2022-11-15 14-57-02

ffreyer commented 1 year ago

The issue comes from using a matrix where each column is a multiple of another M[:, i] = c * M[:, j].

After further investigation this seems to be an issue with this decomposition https://github.com/carstenbauer/MonteCarlo.jl/blob/6ff4856ecf4cc330b0f23c0d768e3ff3c830c11b/src/flavors/DQMC/measurements/greens_iterators.jl#L224 (and potentially some more) when the system is completely empty. In this case G becomes approximately 0 and the UDT decomposition may generate zeros on the diagonal. (For completely filled systems the same should happen in the UDT decomposition for G0l.)

I think it's fine if the Diagonal becomes 0 here. The greens function is effectively zero here anyway.

While testing this I also played around with using the leftover Ur Dl Tr from calculate_greens_AVX! instead of doing these UDT decompositions. While these are not a unitary - diagonal - triangular pair (they are not-quite-unitary, diagonal, unitary) it should be fine to use them as is for Gll and Gl0, but G0l requires the - I. Calculating that as udt(Ur Dl - Tr) Tr' works, but results in significantly larger errors than just subtracting from G00 directly. So it's not worth it.

ffreyer commented 1 year ago

I changed the tests to compare U = 0 DQMC vs exact solutions. In this case no checkerboard is also exact. The errors between checkerboard and exact scale with $\Delta\tau^{1.1} - \Delta\tau^{1.2}$ in terms of norm(exact - DQMC) and maximum(abs.(exact - DQMC)) for equal time and time displaced greens functions. That seems worse than it should be to me...

I also messed around a bit with generating other decompositions. An approach that generals similar matrices is to use Gaussian elimination. From my testing this yields about 0.75size(input, 1)^1.95 matrices I + v e_i e_j (i.e. identity + one element) which should be less operations in a matmult, I think, but is worse than a normal matmult in practice.

carstenbauer / MonteCarlo.jl

Add wrapper type for checkerboard decompositions #169