ITensor / ITensorParallel.jl

Parallel tools for ITensors.jl.
MIT License
21 stars 3 forks source link

Store environments remotely with Distributed-based MPO sum #15

Closed mtfishman closed 1 year ago

mtfishman commented 1 year ago

This relies on ITensors v0.3.27.

mtfishman commented 1 year ago

@b-kloss I believe I got everything working with Distributed.jl. Not sure what was going on before, maybe something silly with not keeping the indices or tensors updated properly across processes (i.e. I may not have been broadcasting the results of updating the environments properly).

In the end, all that's needed is the new distributedsum.jl file which overloads some basic operations on remote objects (Future objects from Distributed.jl), and makes sure the operations are being performed remotely on the worker/process where they currently live, using macros calls like @spawnat and @fetchfrom where you can specify the worker/process where the operation should be performed to be the one where the term of the sum is currently residing, using term.where.

I'm curious how the performance compares to the MPI.jl implementation (I did some simple tests at smaller bond dimensions but nothing systematic), I'm hoping we can just use Distributed.jl going forward since that would simplify things a lot. It's easier to test, develop, and write simpler code that is generic across parallel vs. sequential execution and where the parallelism is hidden farther down and handled through dispatch.

mtfishman commented 1 year ago

@emstoudenmire this should give you some idea of how the AbstractSum interface in https://github.com/ITensor/ITensors.jl/pull/1046 will be used.