Starlink / starlink

Starlink Software Collection
162 stars 53 forks source link

CUPID (GaussClumps) can print wrong rejection reason #57

Closed grahambell closed 7 years ago

grahambell commented 7 years ago

When FINDCLUMPS (using METHOD=GaussClumps) rejects clumps, it seems always to report this as due to touching the data array edge, e.g.:

1 clump rejected because it touches an edge of the data array.

However it can also reject them due to the clump peak being too low, but this is only seen when debugging output is enabled, e.g.:

Iteration 3:
   Integrated clump intensity: 1708.34785167207 (in 441 pixels)
   (the clump peak is too low)

And in this case the main message (normally the only one you see) is very misleading. It looks like the message pre-dates the addition of the "thresh" parameter.

Background information: I have a SCUBA-2 450um observation with only one significant feature which is very close to "thresh" (in this case 10) * the RMS and nowhere near the edge. Re-reducing the observation seems to cause enough change in the map for the clump to randomly switch between being accepted and being rejected. With the message indicating that this was due to touching the edge this was very confusing.

dsberry commented 7 years ago

So do you think that the "touches an edge" message should be relegated to debug level, or that the "peak is too low" message (or some equivalent count of low peaks) should be elevated to normal level? If we adopt the later course, we should probably also elevate all the equivalent messages that report peaks rejected for other reasons, and probably also in all methods (not just GaussClumps).

One option would be to relegate the "touches an edge" message, and then include one message at normal level that gives a total count of peaks rejected for any reason..

grahambell commented 7 years ago

The simplest fix would be to change the message so instead of saying "1 clump rejected because it touches an edge of the data array" it said "1 clump rejected because it touches an edge of the data array or was below the threshold". It looks like the code which prints this message doesn't know why the peak was rejected (I think it was written before the threshold was added) so it could just give the number rejected and not claim one reason or the other. It's just a problem when it says the clump was rejected for touching the edge as that might cause the user to waste a lot of time trying to figure out how a clump in the middle of their data could possibly touch the edge of the array.

Of course it would be better if the system could count the number rejected for each reason and then print two normal output lines (if both apply) such as:

"3 clumps rejected because they touch the edge of the data array." "5 clumps rejected because they were below the threshold/"

MalcolmCurrie commented 7 years ago

Thank you for raising this Graham. I'd received that misleading message myself while David was incapacitated, while trying to locate spectral lines in heterodyne reference spectra. I agree that it is better to have a generic error rather than a misdirecting specific error.

i don't understand why messages relating to different errors should have different error levels. Wouldn't the user want normally to know how many clumps were rejected for whatever reason? Best of all, as Graham says, is to present a breakdown of the numbers of clumps reject for each and all reasons. Say if too many failed to attain the threshold, the user might want to lower the threshold. A clearer report It would have helped me understand why certain lines were unexpectedly being rejected.

grahambell commented 7 years ago

Regarding the different levels: the debugging message is printed while considering each potential clump individually whereas the summary (number of clumps rejected) is shown at the end, so the different levels seem reasonable.

grahambell commented 7 years ago

I implemented the simple update to the message as described above in commit d4ca039938dd6ad23b4917f3fd8eb8191ef81b98 so that we have it in time for the 2016A release.

dsberry commented 7 years ago

Thanks Graham.