Introduce masked arrays to deal with negative AIC

Exo-TiC / ExoTiC-ISM

This is a repository for the reduction pipeline detailed in Wakeford, et al., 2016, ApJ. The method implements marginalization across a series of models to represent stochastic models for observatory and instrument systematics. This is primarily for HST WFC3, however, may be extended to STIS in the future.

MIT License

8 stars 6 forks source link

Introduce masked arrays to deal with negative AIC #46

Closed ivalaginja closed 5 years ago

ivalaginja commented 5 years ago

This PR deals with issue #37. Instead of cutting the arrays down with the positional array pos that holds the indices of all positive AIC values, I rewrote this to use masked numpy arrays, thus retaining the array structure for all systematic models.

For now, I kept the old version as well to facilitate direct comparison of the results to make sure the arithmetic happens correctly with the masked arrays (which it should according to numpy documentation, but better to check). I also adjusted the Numpy Data notebook to be able to test the saved results quickly.

Before merging this, we need to:

[x] test this on bad data that actually yields negative AIC on some systems (running fine on W17 and W6 data that have no systematic models with negative AIC)
[x] clean up the code and get rid of the old version that I left there for testing (look for #masked in the code)

ivalaginja commented 5 years ago

@hrwakeford I realized we had the W6 dataset and I managed to find its parameters in an older commit on the repository. I ran it with that but no system yields a negative AIC for this data, so if we have anything else to try this on, that would be great. I used the chance and updated the W6 planetary parameters in the configfile.

hrwakeford commented 5 years ago

I have created a new data file to test the negative AIC condition. The black points show the original data. I have then applied a linear trend to the data which corresponds to one of the major systematic trends and applied that to the data to create a file which will reject any systematic models which do not fit for this trend in time. Test_lightcurve_for_AIC_46

The following contains the file for the red line which will test the AIC condition W17_AIC_test_issue46.txt

hrwakeford commented 5 years ago

While running a test on this I found a few things that will need changing. Screen Shot 2019-08-02 at 11 24 44 AM This line stating that it is showing negative AIC position is actually showing where the positive AIC condition is met. So should be changed to state: Valid models positions =

hrwakeford commented 5 years ago

Also, the output PDF will need to include a statement that the systematic model array was masked to remove N models which did not satisfy the positive AIC condition - the figures should then possibly include blank spots for masked results while keeping the used model in the correct number position on Figure 1

ivalaginja commented 5 years ago

I rebased the branch on the latest version of develop.

ivalaginja commented 5 years ago

Note how when dealing with masked arrays, we don't have to make sure we use the NaN-ignoring numpy functions, e.g. np.nansum, np.nanmin and so on, even if the mask fill value is NaNs. However, at some point we pick values out of the masked arrays which don't end up in masked arrays, hence I had to substitute the numpy functions in a spot or two with the np.nanargmin and np.nanmin functions, which will do their calculation by ignoring the NaN values - otherwise they just return NaN themselves, which we don't want. See commit b10bf32.

ivalaginja commented 5 years ago

The test was really useful, there were a couple of nasty bugs in this. It runs fine now and works correctly with masked data. I have included extra outputs to the console with information about which and how many models get masked out and ignored in the marginalization. I also included this information in the PDF report.

I bumped the part about adjusting the figures to using masked array to issue #53 since that will be a little fiddly but should not influence the code itself, plus we need to adjust the figures to look proper in the PDF report anyway.

ivalaginja commented 5 years ago

@hrwakeford this is ready for review, I think it is enough if you just run the main script once you have changed to this branch and pulled all the changes and let me know if anything appears broken.

ivalaginja commented 5 years ago

I rebased this branch on develop.

hrwakeford commented 5 years ago

I have run a test on the non-rebased version of this branch and I find the following error

hrwakeford commented 5 years ago

I get this same issue when I have run the module on the re-based code.

ivalaginja commented 5 years ago

I realized that I had tested this only on the "bad" test data. Running it on our "normal" G141 data means there is no system with negative AIC, which means that the numpy masked array will have a mask that is a single boolean saying False as opposed with an array filled with booleans (True for elements that are masked, False for elements that are not masked). Since I am tiling that mask at some point, the dimensions did not accord with the arrays I was applying this mask to.

I fixed this in commit 32e9168 by adding an if switch where appropriate to check whether the initial mask is a bool or an array, and only extend it if it's not a bool, since in this case we have to build the full mask array as opposed to just passing the False value around.

Note how numpy booleans (numpy.bool_) behave differently than Python booleans, I found a short description and comparison here: https://joergdietrich.github.io/python-numpy-bool-types.html