cabouman / mbircone

BSD 3-Clause "New" or "Revised" License
11 stars 9 forks source link

Laminography code crashes #143

Closed cabouman closed 7 months ago

cabouman commented 1 year ago

It has been observed that the laminography code crashes with the following parameters

Here is the function call the results in a crash

vol = recon_lamino(y, angles, lami_angle, num_image_rows=800,
                  num_image_cols=800, num_image_slices=120,
                  stop_threshold=0.05, snr_db=30,
                  sharpness=0.0, positivity=False)

Here are the parameters:

DamonLee5 commented 1 year ago

I successfully reproduce this nan issue using this data size in branch lamino_crash. If you want to reproduce the nan issue, you can just run demo_laminography_test.py

cabouman commented 1 year ago

Excellent! Next we need to figure out where the first nan is produced.

DamonLee5 commented 1 year ago

I checked down the bug and found out that the first infinity value happen in theta1_f.

I print out the value of theta1_f before it became an infinity value icdInfo[k_M].theta1_f = 2.993959e+38

The max value of float is 3.402823466 E + 38 This thing keeps accumulating in line 961 and finally became an infinity value. icdInfo[k_M].theta1_f += parallelAux->partialTheta[threadID][k_M].t1; Do you think theta1_f from the forward model can be that large with this data size?

Also, the original code already has the zero value check for theta2. Therefore, I don't think theta2 is the reason for this issue.

cabouman commented 1 year ago

Wenrui, good detective work. We are getting close. We may need to Zoom in order to track down the problem. It appears that a pixel with no associated sonogram measurements is for some reason accumulating an infinite value for the first derivative. It's probably some strange corner case that Thilo never thought about. If necessary, we might need to contact Thilo to see if he can give us some advice.

Comments/questions:

DamonLee5 commented 1 year ago

Right. It is line 935. I add some printf function in my local branch, which I don't want to push to GitHub.

dyang37 commented 1 year ago

It seems that Wenrui's test script does not reproduce Brendt's case. In Wenrui's case, the nan occurs because the tilt angle is set to 361 degree. In this case, the source-detector line is (almost) parallel with the rotation axis. Instead, we should set num_views=361 to reproduce Brendt's case.

In fact, Wenrui's test case crashed because we have this unusual tilt angle of 361 degree. If we change the tilt angle back to a normal lamino angle (e.g. 60 degree), then the nan problem goes away.

So I think we still need to reproduce Brendt's problem.

DamonLee5 commented 1 year ago

According to Diyu's input, I change the tilt angle to 60 deg and set the num_views to 361. (demo_laminography_test2.py ) The cost keeps increasing since Iteration 1. The overflowing still happens in iteration 14.

python demo_laminography_test2.py

(Charlie: I deleted a bunch of stuff to simplify this comment)

** Iteration 14 (max. 100) **

icdInfo[k_M].theta1_f -3.228554e+38 icdInfo[k_M].theta2_f 5.410564e+04

dyang37 commented 7 months ago

A temporary fix was implemented in PR #152.