Closed xiaodong-yu closed 3 years ago
Hello @xiaodong-yu! Thanks for updating this PR. We checked the lines you've touched for PEP 8 issues, and found:
There are currently no PEP 8 issues detected in this Pull Request. Cheers! :beers:
@xiaodong-yu, I pulled these images from
tests/results/lstsq_grad
Can you see that the texture of the noise switches half way down the image with 2 processes?1 process 2 processes Something about this implementation is causing this. Any ideas? Do you want a large dataset to test?
@carterbox, I can't clearly see any differences between these two images. Could you please explain it a little bit more to me? Also, is the 1 process image reconstructed by multiple GPUs? I highly suspect that any differences come from the mean issue you mentioned before. I find that the splitting of the test data is not even.
It's more obvious in this image (the amplitude of the first view reconstructed by the lstsq test). I increased the problem size and increased the number of processes to 4. I'm using a single GPU because the hardware layout shouldn't affect the quality.
You can see there are 4 distinct horizontal stripe regions in the reconstruction. Probably the common probe constraint isn't correctly implemented.
It's more obvious in this image (the amplitude of the first view reconstructed by the lstsq test). I increased the problem size and increased the number of processes to 4. I'm using a single GPU because the hardware layout shouldn't affect the quality.
You can see there are 4 distinct horizontal stripe regions in the reconstruction. Probably the common probe constraint isn't correctly implemented.
OK, now I see the problem. I agree the hardware layout won't affect the results. But I don't think the issue comes from this MPI implementation since it doesn't really change the parallelization design. Any algorithmic problem very likely is caused by the multi-device implementation, even within a single node. I'll check the previous PR to see if it has the same issue with single node multi-GPU.
@carterbox I revised the code and don't see the issue anymore with the test image. Please have a review and let me know if you still see the problem in your reconstructions.
I plan to look at this on Thursday or Friday.
@xiaodong-yu, the stripes still appears for me. I have pushed a commit which changes the ptycho tests so we are looking at the same size image. I run the following test and there is a noticeable line across the middle of the image.
mpirun -n 2 pytest tests/test_ptycho.py -vs -k 'lstsq and not variable'
Using more than two mpi processes causes an exception in mpi4py.
@xiaodong-yu, the stripes still appears for me. I have pushed a commit which changes the ptycho tests so we are looking at the same size image. I run the following test and there is a noticeable line across the middle of the image.
mpirun -n 2 pytest tests/test_ptycho.py -vs -k 'lstsq and not variable'
Using more than two mpi processes causes an exception in mpi4py.
I still didn't see the stripe region in the image (2 processes, as attached). Did you refer to the phase images in tests/result/lstsq_grad
?
No, I'm looking at the amplitude image.
@carterbox Why the amplitude images I found in the folder are the following random images?
There are two object views and two probes reconstructed, so there should be 8 images:
tests/result/lstsq_grad/0-ampli.png
tests/result/lstsq_grad/0-phase.png
tests/result/lstsq_grad/0-probe-ampli.png
tests/result/lstsq_grad/0-probe-phase.png
tests/result/lstsq_grad/1-ampli.png
tests/result/lstsq_grad/1-phase.png
tests/result/lstsq_grad/1-probe-ampli.png
tests/result/lstsq_grad/1-probe-phase.png
Image 0 has amplitude constant 1. Image 1 has amplitude "coins". Probe 0 has phase "cryptomeria". Probe 1 has phase "bombus".
The amplitude of image 0 isn't supposed to look like "satyre", but the lstsq method isn't very good.
@carterbox, the issue is solved. Please try it again.
Purpose
Add multi-node to lstsq_grad method by using mpi4py.
Approach
Apply MPI operations onto the communication and sync spots in the
lstsq_grad
method, in addition to the threadpool operations.Pre-Merge Checklists
Submitter
yapf
to format python code.Reviewer