STScI-Citizen-Science / MTPipeline

Pipeline to produce CR rejected, astrodrizzled, png's of HST WFPC2 solar system data.
6 stars 1 forks source link

Examine range of readnoise and gain in FITS headers #115

Closed ktfhale closed 10 years ago

ktfhale commented 10 years ago

We should figure out to what degree the readnoise and gain varies amongst the 4 amplifiers on ACS and WFC3. Knowing that, we can better implement a scheme to grab the most pertinent value out of the header, without delving into a hideous bunch of if-cases covering every single possible subimage.

acviana commented 10 years ago

Good idea, how are you planning to do this?

I have a suggestion. Loop over all the headers, pull out the relevant keywords, and store them in some data structure, possibly save them somehow if this takes a while. Then use pandas for the analysis.

ktfhale commented 10 years ago

When I was looking at CTE correction, I wrote code to grab from the headers of every _flt.fits image in our archive the DETECTOR card value. This will be much the same, just grabbing the varied readnoise and gain cards. Opening all the FITS files takes a little while, so I'll make sure to save this information to a textfile so I don't have to rerun the function every time.

I'll just spit out a few graphs to start with, but gathering some statistics on our data will go hand in hand with Ticket #116, so we can have a diverse test set of data.

acviana commented 10 years ago

Right, I'm suggesting Pandas as the tool to do the analysis. It's pretty cool and I think you'll enjoy it.

ktfhale commented 10 years ago

I'm still figuring out Pandas, but here are some preliminary results for the readnoise and gain distributed across the four amplifiers on ACS and WFC3. The y-axis is the number of FITS files, and the x-axis is the value of either the readnoise or gain on that amplifier.

Readnoise:

rdnoise1

Gain:

gain1

When a file has a value of 0 for the readnoise or gain on that amp, it means the CCD subsection corresponding to that amplifier's not being used to take data. We see pretty quickly that the large majority of our uses only amplifier/chip subarray C. Note, however, that ACS data are also included on these plots. We knew that WFC3 is set up like this:

And we already knew that most of our WFC3 data came from some subset in chip C. But I don't know how ACS subarrays work. I'm figuring that out now.

Anyway, just from the look of things, it seems probable that we could get away with having cosmicx for WFC3 set to use a readnoise of 4 or 5. But there's also those few hundred files that were taken using all four chips, and which have a readnoise of 20. I'm going to figure out what instrument is taking those.

acviana commented 10 years ago

I think this definitely needs to separated by instrument and camera.

ktfhale commented 10 years ago

To recapitulate, we want to give cosmicx the best possible value for readnoise and gain as we can, given the information we have in our FITS headers. For WFPC2, this is pretty easy. WFPC2 data doesn't even have a readnoise keyword, and has only a single gain keyword, ATODGAIN, standing for analogue to digital gain. The question is whether we need to extract ATODGAIN for every image, or whether we can get away with a fixed value for all WFPC2 data.

WFPC2 gain: wfpc2 wfpc2_hist

Most WFPC2 data has a gain of 7, but a significant portion of it has a gain of 15. We probably should extract ATODGAIN from WFPC2 data on an image by image basis, and use that to set the gain for cosmicx.

The story's a little more complicated for most of the other instruments. For ACS/SBC, however, it's very simple: the FITS file contains no gain or readnoise information at all. I think this is because it's not a CCD instrument, but a photocathode. Goodness knows what we should ultimately pick for its settings.

For the rest, readnoise and gain information are provided for each of the four chips/amplifiers, A,B,C,D, in the header cards ATODGN<A,B,C,D> and READNSE<A,B,C,D>. There's also a CCDGAIN keyword. I don't quite know how to weigh the analogue to digital gains vs this ccd gain. My guess is that CCD gain might be something we are capable of setting, whereas the the analog gains are intrinsic properties of the instrument. I should ask somebody about this.

The difficulty is that, for instance, ATODGNA can be different than ATODGNB, so which should we give cosmicx?

The simplest form of this problem is with WFC3 / IR:

IR gain: irgain irgain_hist

The top plot shows the gains, ATODGN<A,B,C,D> and CCDGAIN, of every one of our IR images. The lower plot is a histogram showing the relative abundances of the values. It seems each of the IR amplifiers always have a single particular gain and readnoise, no matter the image. Fortunately for us, they're all quite close, with ~2.35 for the analog to digital gain, and 2.5 for the ccd gain. The readnoise is similarly around 20.0 for each of the four amplifiers. I've elected not to show any of my readnoise plots, because they don't really differ from the gain plots in any important ways. They have different values, of course, but where ATODGNA varies, READNSEA varies as well.

So for IR images we could set readnoise = 2.4 and gain = 20 for cosmicx and be happy, I think. Variations 0.1 in magnitude are not worth worrying about.

ACS / HRC is a little more complicated:

HRC gain: hrcgain

hrcgain_hist

None of our HRC data makes use of chips/amplifiers A and B. And it seems like the vast majority of it uses only chip C, which we expect. But no matter which chip is used, the gain is generally 2... except for a few images on chips C and D which have it set to 4. This happens when CCDGAIN is set to 4 as well, which makes me think that CCDGAIN actually in some way sets ATODGAIN<A,B,C,D>, maybe. Anyway, it's a fairly small minority of our HRC data that has a gain of 4 instead of 2, but it would still probably be good to use the correct gain.

We probably want the ability to extract and feed the readnoise and gain to cosmicx anyway, because of ACS/ WFC:

WFC gain: wfcgain wfcgain_hist

Compared to the other insturments, WFC has a hodgepodge of readnoises and gain values. We can see that some of the first WFC images, at the left, use all four chips and have gain of 2 and 4. Later there's a spat of images that only use chip D, a collection of images that use only chip c, a set that use all four chips but with a gain of 1... it's a little bit of a mess.

Fortunately, for WFC there never seems to be a case of an image using multiple chips that have different values for the readnoise and gain. Whenever it's a multiple-chip image, it seems all of those chips use the same readnoise and gain values. This value is also the same as CCDGAIN's.So, to give the appropriate value for the readnoise and gain to cosmicx, we can likely give the maximum of the the individual READNSE<A,B,C,D>, and theCCDGAIN` value for the gain.

I think this should work equally well for WFC3 / UVIS. UVIS gain:

uvisgain uvisgain_hist

For UVIS, the gain's either 1.6 or 0. CCDGAIN is always set to ~1.5, so it's just a matter of which combination of chips are used for an image. Most of the data is from chip C, but there's an appreciable amount from chips A and B as well. For a few images, it uses all four. There turns out to be a little bit more variation in the values the readnoise can take on (it generally hovers around 3.0), but this doesn't matter much. We can just pull out the maximum value of the four different readnoises, and CCDGAIN still seems to always be the applicable

So my plan is to have the get_metadata function I added in Ticket #119 also get the four readnoise cards, and the CCDGAIN card, for the ACS and WFC3 instruments. For WFPC2, we'll also pull out the ATODGAIN, as it can be 7 or 15.

ktfhale commented 10 years ago

I've added the necessary logic into run_cosmicx.py and imaging_pipeline.py so that the appropriate values for the readnoise and gain are fed into cosmicx, where possible. I've run this updated version of the pipeline on images from all six detectors, and everything appears to work up to running astrodrizzle.

Obviously, astrodrizzle can't run on non-WFPC2 data yet. However, when I ran it on a WFPC2 image, it hangs partway through running astrodrizzle, I think on one of the extensions. The log is here.

Other than hanging in the astrodrizzle step, I'd say this branch is ready to be merged. I'll hold off on submitting a request until I hear or figure out more about this problem with drizzling WFPC2 data.

ktfhale commented 10 years ago

With my latest commit in the add_metadata branch, the pipeline now appears to successfully run up to astrodrizzle for all inputs, and to successfully complete drizzling, and generate the pngs, for WFPC2 data.

The issue I had with the WFPC2 AstroDrizzle hanging turned out not to be caused by WFPC2 data, or drizzling. Rather, it seems that if one of the pool workers ever encounters an exception logged as CRITICAL, the pipeline just doesn't exit. Prior to my latest commit, a CRITICAL exception occurred on one of my sample WFC files, which had a different number of extensions than I assumed all WFC files had. I've corrected that, but CRITICAL-level problems are also raised every time we try to run astrodrizzle on the non-WFPC2 data. But ther than that, I'm not seeing any issues.

I'm debating whether to merge my add_metadata branch into master. I don't think there are any errors with its functionality over all inputs, up to Astrodrizzling. If you run it with just WFPC2 data, it works fine. Run it with a combination of WFPC2 and non-WFPC2 data, or just non-WFPC2 data, and it hangs, due to (I think) the issue above. So I believe we could merge the branches without adverse effects.

On the other hand, it might be good to keep these branches separate until we have the files from Max, and we can get Astrodrizzle running for non-WFPC2 data. Even if he could just provide dummy files, that would be helpful. Then I could run an actual end-to-end test, and make absolutely sure I haven't introduced problems.

acviana commented 10 years ago

The issue I had with the WFPC2 AstroDrizzle hanging turned out not to be caused by WFPC2 data, or drizzling. Rather, it seems that if one of the pool workers ever encounters an exception logged as CRITICAL, the pipeline just doesn't exit.

This is troubling behavior in-and-of itself, maybe worth looking into. As far as merging or not merging my concern is getting the CR rejection portion of the pipeline running on the ACS and WFC3 data ASAP. We run the other sections as we catch up and in the mean time Max and/or @dylanhazlett can review the results to ensure they are acceptable.

ktfhale commented 10 years ago

I've stopped using the ATODGAIN card from WFPC2 to set the cosmicx gain in my add_metadata branch. Just using a gain of 1 for WFPC2 data gives a far superior cr rejection than using the value of 7 or 15 in the header.

ktfhale commented 10 years ago

As using the readnoise and gain from the headers for the other images seems to be producing good cr rejections, I think this ticket can be closed.