AdvancedPhotonSource / cohere-ui

User files
Other
7 stars 9 forks source link

Distributed multipeak reconstructions #18

Open jacione opened 1 week ago

jacione commented 1 week ago

I've run a number of consistency tests on the multipeak algorithms over the past few months, and somehow it just now occurred to me that it would be a lot faster and easier to do that if I could distribute the work to multiple GPUs. However, the pipeline in run_reconstruction.py diverts the multipeak reconstruction process before handling any of the work distribution (as of a9b570e). This means that there is no GA or multi-reconstruction capability for multipeak BCDI. I'm currently trying to come up with a workaround for this on my Dev branch.

Any suggestions would be welcome, particularly as I have basically no experience with MPI!

As a side note, it might be a good idea to rename the reconstruction_multi features to avoid confusion with multipeak reconstruction. Maybe reconstruction_batch or reconstruction_group? If it weren't already somewhat entrenched in the community, I would prefer to rename "multipeak BCDI" to "polyreflective BCDI", simply because it sounds cooler :)

bfrosik commented 1 week ago

I can parallelize the flow. We would need to get together to figure out which multi-peak operations can run at the same time. Currently the iteration sequence is handled by the peaks taking the work in turns. If you are thinking about running the multipeak in "multi-reconstruction" or GA mode, that is another issue, and indeed we can add this capability by using the existing logic in run_reconstruction. And your observation about the "multi" name is insightful. We should get feedback from Ross/Wonsuk about the naming.

jacione commented 1 week ago

Actually, phasing peaks in series is inherent to the algorithm itself. I was thinking more about the ability to run several multipeak reconstructions at the same time.

When I do consistency tests, I have a script that goes through a bunch of parameter configurations, running 5-10 reconstructions with each set, which generally takes a few hours. If I could split those up across several GPUs, it would make those tests go a lot faster.

bfrosik commented 1 week ago

I understand. Then I will change the code to call multipeak after the logic that distributes resources. Is there a good latest example I can run on Sayre(Shannon)?