NLESC-JCER / QMC

Deep Learning for Quantum Monte Carlo Simulations
Apache License 2.0
7 stars 2 forks source link

Parallelize the sampling #1

Closed NicoRenaud closed 4 years ago

NicoRenaud commented 5 years ago

The MC sampling is now done on 1 proc only. This is trivial to parallelize but we could use different options : mpi4py, multiprocess, ....

NicoRenaud commented 5 years ago

The multiprocessing route is more difficult than expected. See the mp_sampling branch

NicoRenaud commented 5 years ago

There is a native distriubted module in pytorch : https://pytorch.org/tutorials/intermediate/dist_tuto.html

NicoRenaud commented 5 years ago

Instead of parallelizing the sampler, It might be best to use DistributedDataParallelto directly parallelize the training (https://pytorch.org/docs/stable/nn.html#torch.nn.parallel.DistributedDataParallel)

We could also split the number of walker in the training function and average the gradients as done there https://pytorch.org/tutorials/intermediate/dist_tuto.html

NicoRenaud commented 5 years ago

The average gradients method seems to be the best for us. However a MPI backend would simplify stuff https://medium.com/intel-student-ambassadors/distributed-training-of-deep-learning-models-with-pytorch-1123fa538848

NicoRenaud commented 4 years ago

The DistributedDataParallel of pytoch is too hard to use. I'll switch to Hovorod instead