cabouman / svmbir

Fast code for parallel or fan beam tomographic reconstruction
BSD 3-Clause "New" or "Revised" License
19 stars 8 forks source link

svmbir diverges when too many threads are used with small num_slices. #268

Closed dyang37 closed 9 months ago

dyang37 commented 1 year ago

Recently "bugzilla" is observed again on a problem with small number of slices, where divergence of the MBIR recon is observed with default number of threads on a problem with recon array size of 5x400x400.

I managed to reproduce this with 3D shepp Logan. This can be reproduced by setting num_slices=5, num_rows_cols=256, and max_resolutions=0 (no multi-resolution). With the setting above, the maximum number of threads without breaking the code is 20. Here are the results:

num_threads=20: convergence behavior: threads20 final_recon: Screen Shot 2022-11-08 at 5 00 02 PM (artifacts potentially due to small number of slices)

num_threads=24: convergence behavior: Screen Shot 2022-11-08 at 5 02 37 PM final recon: Screen Shot 2022-11-08 at 5 03 25 PM

Again, I turned off multi-res to isolate the problem. However, the limit of the thread number could potentially change when multi-res is turned on.

dyang37 commented 1 year ago

One more important thing, the divergence behavior is only observed when positivity=True. When positivity is turned off, MBIR converges as expected regardless the number of threads used: Screen Shot 2022-11-08 at 5 41 02 PM

The final recon is the same as using 20 threads with positivity=True.

cabouman commented 1 year ago

Two comments/questions:

dyang37 commented 1 year ago

Two comments/questions:

  • So it only diverges when positivity=True? That's crazy because positivity is supposed to help.
  • How many cores did you have on the computer?
  1. Yes. It only diverges with positivity=True.
  2. I used a server with 48 CPUs.
gbuzzard commented 1 year ago

Note that when positivity is false, then the c code will reduce the number of threads without warning. So it's probably reducing to 20 or fewer threads in that case.

XiaoWang-Github commented 1 year ago

Hi Diyu, when you don't use the multi-resolution, do you still get the convergence issue? I am assuming that the answer is no. If that is the case, I think I know the issue.

The parallel super-voxel can indeed possibly cause convergence overshoot. It keeps all the error sinogram updates for a super-voxel locally, wait for all the super-voxels in the thread groups to complete, and then atomically add those changes to the error sinograms. Usually, this will not cause convergence issue, although it might slow down the convergence slightly, because each super-voxel has enough voxels to update and enough error sinogram entries. However when you enable multi-resolution and when the initial resolution is very low, each super-voxel has few voxels to update and few error sinogram entries to work on. Then if you still have many parallel super-voxel to be updated in parallel, then the convergence overshoot issue gets magnified. So for multi-resolution, you want to reduce the number of threads when the resolution is low, and then increase the number of threads when the resolution increases. That's one way to address it. Another way to address it is to implement a convergence overshoot dampening factor on the magnitude of the voxel update. Dampening the voxel updates significantly when the resolution is low, and then tone down the dampening when the resolution gets higher.

dyang37 commented 1 year ago

Hi Xiao, I still got the convergence issue when multi-res is disabled. The shepp-logan example above is produced with max_resolutions=0 and positivity=True. Regards, Diyu


From: Xiao Wang @.> Sent: Wednesday, November 9, 2022 2:02 PM To: cabouman/svmbir @.> Cc: Diyu Yang @.>; Assign @.> Subject: Re: [cabouman/svmbir] svmbir diverges when too many threads are used with small num_slices. (Issue #268)

---- External Email: Use caution with attachments, links, or sharing data ----

Hi Diyu, when you don't use the multi-resolution, do you still get the convergence issue? I am assuming that the answer is no. If that is the case, I think I know the issue.

The parallel super-voxel can indeed possibly cause convergence overshoot. It keeps all the error sinogram updates for a super-voxel locally, wait for all the super-voxels in the thread groups to complete, and then atomically add those changes to the error sinograms. Usually, this will not cause convergence issue, although it might slow down the convergence slightly, because each super-voxel has enough voxels to update and enough error sinogram entries. However when you enable multi-resolution and when the initial resolution is very low, each super-voxel has few voxels to update and few error sinogram entries to work on. Then if you still have many parallel super-voxel to be updated in parallel, then the convergence overshoot issue gets magnified. So for multi-resolution, you want to reduce the number of threads when the resolution is low, and then increase the number of threads when the resolution increases. That's one way to address it. Another way to address it is to implement a convergence overshoot dampening factor on the magnitude of the voxel update. Dampening the voxel updates significantly when the resolution is low, and then tone down the dampening when the resolution gets higher.

— Reply to this email directly, view it on GitHubhttps://github.com/cabouman/svmbir/issues/268#issuecomment-1309225289, or unsubscribehttps://github.com/notifications/unsubscribe-auth/AB2Q7OBCXPIA7IJ5BGH4YG3WHPYLZANCNFSM6AAAAAAR2ZOTJA. You are receiving this because you were assigned.Message ID: @.***>

XiaoWang-Github commented 1 year ago

ok, then I have no clue then. This sounds really bizarre. I have definitely run the super-voxel algorithm with much more than just 20 threads before. I didn't have any issue.

cabouman commented 1 year ago

I think we have a possible fix for the new Bugzilla. Replace lines 290-293 of recon3d.c with the following:

    max_of_num_slices_svdepth = ((num_slices < SVDEPTH) ? num_slices : SVDEPTH)
    i = (((Nx < Ny) ? Nx : Ny) / (2*SVLength+1)) * (max_of_num_slices_svdepth/SVDEPTH);
    max_threads = ( i < max_threads) ? i : max_threads ;

Previously, the problem is that

SV_per_Z = ceil( num_slices/SVDEPTH )

So that we could allow overlap between SVs when the number of slices was not an integer multiple of the SV depth.

dyang37 commented 1 year ago

PR to C-code is created here: https://github.com/HPImaging/sv-mbirct/pull/17 @sjkisner Could you please check that out when you have time?

XiaoWang-Github commented 1 year ago

Diyu, I went through the fix you proposed. It's great that you are investing the C code, but the fix you proposed doesn't make sense to me.

Let's look at the original C code that you want to change: SV_per_Z = ceil( num_slices/SVDEPTH )

Suppose that we have 5 slices (num_slices=5), and the super-voxel depth is 4. In that case, SV_per_Z which is the number of super-voxels along the z direction at a fixed X-Y plane coordinates, will be computed as 2 from the above equation ceil(5/4)=2. And that's what it supposes to be. The first super-voxel will be across 4 slices with depth 4, and the second super-voxel will only have 1 slice with depth 1. You don't want to use floor here, because that will skip the updates for the last slice. I don't think the convergence issue has much to do with this line of code.

cabouman commented 1 year ago

Jordan, I proposed this change because it seemed to fix the problem, and I thought it would at least start the discussion. But after talking to Xiao, I agree with your analysis.

I'll send you an email and cc everyone to discuss further.

sjkisner commented 1 year ago

I pushed a fix in the kisner/fix branch. I just limit the threads as in the case with no positivity constraint. I may be able to relax this a bit by reducing the delay between pixel updates and error sinogram updates.

I'm not putting in the suggested change below because it results in max_threads=0 if the number of slices is less than SVDEPTH.

max_of_num_slices_svdepth = ((num_slices < SVDEPTH) ? num_slices : SVDEPTH)
i = (((Nx < Ny) ? Nx : Ny) / (2*SVLength+1)) * (max_of_num_slices_svdepth/SVDEPTH);
max_threads = ( i < max_threads) ? i : max_threads ;
sjkisner commented 1 year ago

final_recon: Screen Shot 2022-11-08 at 5 00 02 PM (artifacts potentially due to small number of slices)

That inner halo is due to the inter-slice regularization. The 3D phantom generated with a small number of slices makes it very coarse in Z.

I noticed we were requiring b_interslice to be positive. There isn't any reason it can't be set to zero, so I added a change to allow this. If you set b_interslice=0 in the test script that halo will goes away.

sjkisner commented 1 year ago

Fixed as of v0.3.2 #275

sjkisner commented 9 months ago

This was fixed some time ago.