Closed cabouman closed 7 months ago
I successfully reproduce this nan issue using this data size in branch lamino_crash. If you want to reproduce the nan issue, you can just run demo_laminography_test.py
Excellent! Next we need to figure out where the first nan is produced.
I checked down the bug and found out that the first infinity value happen in theta1_f.
I print out the value of theta1_f before it became an infinity value icdInfo[k_M].theta1_f = 2.993959e+38
The max value of float is 3.402823466 E + 38 This thing keeps accumulating in line 961 and finally became an infinity value. icdInfo[k_M].theta1_f += parallelAux->partialTheta[threadID][k_M].t1; Do you think theta1_f from the forward model can be that large with this data size?
Also, the original code already has the zero value check for theta2. Therefore, I don't think theta2 is the reason for this issue.
Wenrui, good detective work. We are getting close. We may need to Zoom in order to track down the problem. It appears that a pixel with no associated sonogram measurements is for some reason accumulating an infinite value for the first derivative. It's probably some strange corner case that Thilo never thought about. If necessary, we might need to contact Thilo to see if he can give us some advice.
Comments/questions:
Right. It is line 935. I add some printf function in my local branch, which I don't want to push to GitHub.
Is the associated value of theta2_f == 0 , exactly? If so, then that confirms that the pixel update that is causing the problems has no projections. No the associate value of theta2_f is not 0. icdInfo[k_M].theta1_f 2.993959e+38 icdInfo[k_M].theta2_f 5.407720e+04
Is the value of icdInfo[k_M].theta1_f overflowing the maximum float value of 3.402823466 E + 38? Yes. I record previous value of icdInfo[k_M].theta1_f and do the accumulation. When detected icdInfo[k_M].theta1_f became infinity print its previous value.
prev_theta1_f = icdInfo[k_M].theta1_f;
icdInfo[k_M].theta1_f += parallelAux->partialTheta[threadID][k_M].t1;
icdInfo[k_M].theta2_f += parallelAux->partialTheta[threadID][k_M].t2;
if isnan(icdInfo[k_M].theta1_f){
printf( "icdInfo[k_M].theta1_f %e\n", prev_theta1_f);
printf( "icdInfo[k_M].theta2_f %e\n", icdInfo[k_M].theta2_f);
exit(-1);
}
if isinf(icdInfo[k_M].theta1_f){
printf( "icdInfo[k_M].theta1_f %e\n", prev_theta1_f);
printf( "icdInfo[k_M].theta2_f %e\n", icdInfo[k_M].theta2_f);
exit(-1);
}
It seems that Wenrui's test script does not reproduce Brendt's case.
In Wenrui's case, the nan occurs because the tilt angle is set to 361 degree. In this case, the source-detector line is (almost) parallel with the rotation axis.
Instead, we should set num_views=361
to reproduce Brendt's case.
In fact, Wenrui's test case crashed because we have this unusual tilt angle of 361 degree. If we change the tilt angle back to a normal lamino angle (e.g. 60 degree), then the nan problem goes away.
So I think we still need to reproduce Brendt's problem.
According to Diyu's input, I change the tilt angle to 60 deg and set the num_views to 361. (demo_laminography_test2.py ) The cost keeps increasing since Iteration 1. The overflowing still happens in iteration 14.
python demo_laminography_test2.py
(Charlie: I deleted a bunch of stuff to simplify this comment)
** Iteration 14 (max. 100) **
icdInfo[k_M].theta1_f -3.228554e+38 icdInfo[k_M].theta2_f 5.410564e+04
A temporary fix was implemented in PR #152.
It has been observed that the laminography code crashes with the following parameters
Here is the function call the results in a crash
Here are the parameters: