Closed ktfhale closed 10 years ago
Good idea, how are you planning to do this?
I have a suggestion. Loop over all the headers, pull out the relevant keywords, and store them in some data structure, possibly save them somehow if this takes a while. Then use pandas for the analysis.
When I was looking at CTE correction, I wrote code to grab from the headers of every _flt.fits
image in our archive the DETECTOR
card value. This will be much the same, just grabbing the varied readnoise and gain cards. Opening all the FITS files takes a little while, so I'll make sure to save this information to a textfile so I don't have to rerun the function every time.
I'll just spit out a few graphs to start with, but gathering some statistics on our data will go hand in hand with Ticket #116, so we can have a diverse test set of data.
Right, I'm suggesting Pandas as the tool to do the analysis. It's pretty cool and I think you'll enjoy it.
I'm still figuring out Pandas, but here are some preliminary results for the readnoise and gain distributed across the four amplifiers on ACS and WFC3. The y-axis is the number of FITS files, and the x-axis is the value of either the readnoise or gain on that amplifier.
Readnoise:
Gain:
When a file has a value of 0 for the readnoise or gain on that amp, it means the CCD subsection corresponding to that amplifier's not being used to take data. We see pretty quickly that the large majority of our uses only amplifier/chip subarray C. Note, however, that ACS data are also included on these plots. We knew that WFC3 is set up like this:
And we already knew that most of our WFC3 data came from some subset in chip C. But I don't know how ACS subarrays work. I'm figuring that out now.
Anyway, just from the look of things, it seems probable that we could get away with having cosmicx
for WFC3 set to use a readnoise of 4 or 5. But there's also those few hundred files that were taken using all four chips, and which have a readnoise of 20. I'm going to figure out what instrument is taking those.
I think this definitely needs to separated by instrument and camera.
To recapitulate, we want to give cosmicx
the best possible value for readnoise
and gain
as we can, given the information we have in our FITS headers. For WFPC2, this is pretty easy. WFPC2 data doesn't even have a readnoise
keyword, and has only a single gain keyword, ATODGAIN
, standing for analogue to digital gain. The question is whether we need to extract ATODGAIN
for every image, or whether we can get away with a fixed value for all WFPC2 data.
WFPC2 gain:
Most WFPC2 data has a gain of 7, but a significant portion of it has a gain of 15. We probably should extract ATODGAIN
from WFPC2 data on an image by image basis, and use that to set the gain for cosmicx
.
The story's a little more complicated for most of the other instruments. For ACS/SBC, however, it's very simple: the FITS file contains no gain or readnoise information at all. I think this is because it's not a CCD instrument, but a photocathode. Goodness knows what we should ultimately pick for its settings.
For the rest, readnoise and gain information are provided for each of the four chips/amplifiers, A,B,C,D, in the header cards ATODGN<A,B,C,D>
and READNSE<A,B,C,D>
. There's also a CCDGAIN
keyword. I don't quite know how to weigh the analogue to digital gains vs this ccd gain. My guess is that CCD gain might be something we are capable of setting, whereas the the analog gains are intrinsic properties of the instrument. I should ask somebody about this.
The difficulty is that, for instance, ATODGNA
can be different than ATODGNB
, so which should we give cosmicx
?
The simplest form of this problem is with WFC3 / IR:
IR gain:
The top plot shows the gains, ATODGN<A,B,C,D>
and CCDGAIN
, of every one of our IR images. The lower plot is a histogram showing the relative abundances of the values. It seems each of the IR amplifiers always have a single particular gain and readnoise, no matter the image. Fortunately for us, they're all quite close, with ~2.35 for the analog to digital gain, and 2.5 for the ccd gain. The readnoise is similarly around 20.0 for each of the four amplifiers. I've elected not to show any of my readnoise plots, because they don't really differ from the gain plots in any important ways. They have different values, of course, but where ATODGNA
varies, READNSEA
varies as well.
So for IR images we could set readnoise = 2.4
and gain = 20
for cosmicx
and be happy, I think. Variations 0.1 in magnitude are not worth worrying about.
ACS / HRC is a little more complicated:
HRC gain:
None of our HRC data makes use of chips/amplifiers A and B. And it seems like the vast majority of it uses only chip C, which we expect. But no matter which chip is used, the gain is generally 2... except for a few images on chips C and D which have it set to 4. This happens when CCDGAIN
is set to 4 as well, which makes me think that CCDGAIN
actually in some way sets ATODGAIN<A,B,C,D>
, maybe. Anyway, it's a fairly small minority of our HRC data that has a gain of 4 instead of 2, but it would still probably be good to use the correct gain.
We probably want the ability to extract and feed the readnoise and gain to cosmicx
anyway, because of ACS/ WFC:
WFC gain:
Compared to the other insturments, WFC has a hodgepodge of readnoises and gain values. We can see that some of the first WFC images, at the left, use all four chips and have gain of 2 and 4. Later there's a spat of images that only use chip D, a collection of images that use only chip c, a set that use all four chips but with a gain of 1... it's a little bit of a mess.
Fortunately, for WFC there never seems to be a case of an image using multiple chips that have different values for the readnoise and gain. Whenever it's a multiple-chip image, it seems all of those chips use the same readnoise and gain values. This value is also the same as CCDGAIN
's.So, to give the appropriate value for the readnoise and gain to cosmicx
, we can likely give the maximum of the the individual READNSE<A,B,C,D>, and the
CCDGAIN` value for the gain.
I think this should work equally well for WFC3 / UVIS. UVIS gain:
For UVIS, the gain's either 1.6 or 0. CCDGAIN
is always set to ~1.5, so it's just a matter of which combination of chips are used for an image. Most of the data is from chip C, but there's an appreciable amount from chips A and B as well. For a few images, it uses all four. There turns out to be a little bit more variation in the values the readnoise can take on (it generally hovers around 3.0), but this doesn't matter much. We can just pull out the maximum value of the four different readnoises, and CCDGAIN
still seems to always be the applicable
So my plan is to have the get_metadata
function I added in Ticket #119 also get the four readnoise cards, and the CCDGAIN
card, for the ACS and WFC3 instruments. For WFPC2, we'll also pull out the ATODGAIN
, as it can be 7 or 15.
I've added the necessary logic into run_cosmicx.py
and imaging_pipeline.py
so that the appropriate values for the readnoise and gain are fed into cosmicx
, where possible. I've run this updated version of the pipeline on images from all six detectors, and everything appears to work up to running astrodrizzle.
Obviously, astrodrizzle can't run on non-WFPC2 data yet. However, when I ran it on a WFPC2 image, it hangs partway through running astrodrizzle, I think on one of the extensions. The log is here.
Other than hanging in the astrodrizzle step, I'd say this branch is ready to be merged. I'll hold off on submitting a request until I hear or figure out more about this problem with drizzling WFPC2 data.
With my latest commit in the add_metadata
branch, the pipeline now appears to successfully run up to astrodrizzle for all inputs, and to successfully complete drizzling, and generate the pngs, for WFPC2 data.
The issue I had with the WFPC2 AstroDrizzle hanging turned out not to be caused by WFPC2 data, or drizzling. Rather, it seems that if one of the pool workers ever encounters an exception logged as CRITICAL, the pipeline just doesn't exit. Prior to my latest commit, a CRITICAL exception occurred on one of my sample WFC files, which had a different number of extensions than I assumed all WFC files had. I've corrected that, but CRITICAL-level problems are also raised every time we try to run astrodrizzle on the non-WFPC2 data. But ther than that, I'm not seeing any issues.
I'm debating whether to merge my add_metadata
branch into master
. I don't think there are any errors with its functionality over all inputs, up to Astrodrizzling. If you run it with just WFPC2 data, it works fine. Run it with a combination of WFPC2 and non-WFPC2 data, or just non-WFPC2 data, and it hangs, due to (I think) the issue above. So I believe we could merge the branches without adverse effects.
On the other hand, it might be good to keep these branches separate until we have the files from Max, and we can get Astrodrizzle running for non-WFPC2 data. Even if he could just provide dummy files, that would be helpful. Then I could run an actual end-to-end test, and make absolutely sure I haven't introduced problems.
The issue I had with the WFPC2 AstroDrizzle hanging turned out not to be caused by WFPC2 data, or drizzling. Rather, it seems that if one of the pool workers ever encounters an exception logged as CRITICAL, the pipeline just doesn't exit.
This is troubling behavior in-and-of itself, maybe worth looking into. As far as merging or not merging my concern is getting the CR rejection portion of the pipeline running on the ACS and WFC3 data ASAP. We run the other sections as we catch up and in the mean time Max and/or @dylanhazlett can review the results to ensure they are acceptable.
I've stopped using the ATODGAIN
card from WFPC2 to set the cosmicx
gain in my add_metadata
branch. Just using a gain of 1 for WFPC2 data gives a far superior cr rejection than using the value of 7 or 15 in the header.
As using the readnoise and gain from the headers for the other images seems to be producing good cr rejections, I think this ticket can be closed.
We should figure out to what degree the readnoise and gain varies amongst the 4 amplifiers on ACS and WFC3. Knowing that, we can better implement a scheme to grab the most pertinent value out of the header, without delving into a hideous bunch of if-cases covering every single possible subimage.