Store environments remotely with Distributed-based MPO sum

mtfishman commented 1 year ago

Store environments remotely when using Distributed-based MPO sum.
Better support for write-to-disk across parallel backends.
Better testing of nested parallelization, write-to-disk, etc.
Simplify MPISum (EDIT: renamed MPISumTerm) and fix numerical issues of MPS tensors getting out of sync across threads by overloading ITensors.position! and ITensors.orthogonalize! to broadcast the new MPS tensors from one process to the rest.

This relies on ITensors v0.3.27.

mtfishman commented 1 year ago

@b-kloss I believe I got everything working with Distributed.jl. Not sure what was going on before, maybe something silly with not keeping the indices or tensors updated properly across processes (i.e. I may not have been broadcasting the results of updating the environments properly).

In the end, all that's needed is the new distributedsum.jl file which overloads some basic operations on remote objects (Future objects from Distributed.jl), and makes sure the operations are being performed remotely on the worker/process where they currently live, using macros calls like @spawnat and @fetchfrom where you can specify the worker/process where the operation should be performed to be the one where the term of the sum is currently residing, using term.where.

I'm curious how the performance compares to the MPI.jl implementation (I did some simple tests at smaller bond dimensions but nothing systematic), I'm hoping we can just use Distributed.jl going forward since that would simplify things a lot. It's easier to test, develop, and write simpler code that is generic across parallel vs. sequential execution and where the parallelism is hidden farther down and handled through dispatch.

mtfishman commented 1 year ago

@emstoudenmire this should give you some idea of how the AbstractSum interface in https://github.com/ITensor/ITensors.jl/pull/1046 will be used.

ITensor / ITensorParallel.jl

Store environments remotely with Distributed-based MPO sum #15