PyWiFeS / pipeline

The Python data reduction pipeline for WiFeS
6 stars 26 forks source link

Fix single threaded arc fit with bad pixels. #33

Closed timothyfrankdavies closed 7 months ago

timothyfrankdavies commented 7 months ago

Description

I ran into an issue while testing https://github.com/PyWiFeS/pipeline/pull/32. When running a particular test case multithreaded, I got different results to running single threaded. I traced the difference to the if multithread: cases in weighted_loggauss_arc_fit.

Both single & multithreaded cases are run using a jobs list. Those jobs store a line index, and running mpfit_gauss_line or lsq_gauss_line on a job returns a tuple storing (line index, center).

A job is made for each arc except arcs with bad pixels. In that case, the center is recorded as 'nan'.

So when we go to record the center for each arc, we need to skip the indexes containing bad pixels. To do that, we just need to read the line index of the job. The multithreaded case did this, but the single threaded case did not.

I've updated the single threaded case to follow the same procedure as the multithreaded case.

Tests

I've run the new version in single threaded & multithreaded modes, and get the same results.

Here's an example compare single threaded runs before & after the fix: image

Comparing multithreaded before & single threaded after: image

I've done small refactors to the multithreaded case now, they pass too.

timothyfrankdavies commented 7 months ago

I've made a number of refactors, and thankfully they haven't complicated the diff too much. There's a couple of risky changes, but I'm retesting and will make changes as needed.

Happy to undo any of the refactors.

timothyfrankdavies commented 7 months ago

I've made a number of refactors, and thankfully they haven't complicated the diff too much. There's a couple of risky changes, but I'm retesting and will make changes as needed.

Happy to undo any of the refactors.

I've now fixed the changes and retested, I get the same results as before the refactor. However, I've only tested the mpfit single-threaded codepath, on classic & ns data. The multithreaded codepath will be retested in #32.

I think the loggauss function hasn't been maintained for some time, as it used np instead of numpy and could have crashed if used. I've gone and renamed those just in case, but it's not necessarily worth it.

The refactor diff in github isn't quite as clean as one I viewed in vscode. Again, happy to undo the refactor if you think it makes it any harder to verify.