Open StevenTomczyk opened 1 year ago
Based on the difficulties we have had finding outliers in the flats as the flat structures change over time, I have been working on a different algorithm that gives promising results.
The new method handles changes in structure by subtracting the local median around the pixel. (scipy.signal.medfilt2d with a kernel_size = 5). With the local structure removed, we compute the per-pixel standard deviation across all flat images within a single-fits file (4 modulations * N extensions).
Visaul's inspection of the standard deviation across the array shows that most of the pixels have standard deviation below some threshold around 8. However, adopting >8 as a threshold identifies 2000 pixels in a single file, about double what we found in the previous analysis. So we probably need a higher threshold.
Zooming out to look at all the flats February->May 2022 Looking at all the flats between February and May 2022 and adopting 11 as the threshold. We find cam0 has about 5000 pixels that cross this threshold at least once. But most of these ~3700 look bad in only one file, so they are probably false positives.
The stdev of 11 or more doesn't do a very good job of matching the previous analysis of hot pixels, which found 1002 hot pixels for cam0. Of the 5300 pixels that appear in 1 or more files, only 170 pixels appear in the hot pixel file. Of the 735 pixels that appear in 10 or more files, only 154 pixels appear in the hot pixel file.
At this point, it is unclear if the extra 800 pixels in the hot pixel file are false positives or demonstrate inadequacy in the new hot pixel method.
More work is needed to refine the threshold and count for the number of files a pixel must appear to be counted as bad. We also need to better understand the validity of the marginally bad pixels in the current hot pixel files. We also need to find a way to validate that an image has all of its bad pixels removed; this would help us ensure we capture all the bad pixels.
Ben, I doubt the hot pixels in Steve list are false positives. Keep in mind that "hot pixels" are unstable pixels and not necessarily higher/lower readings. This is why it is hard to identify them. A simple median filter approach on a dark-corrected flat is not going to work. I think you need to look at data after flat-fielding or use a series of many flats and look at the variation of each pixel. Steve spent a lot of time thinking about this and had experience from CoMP. I suggest Dana to start from where Steve left it, understand what he did, and then explore improved techniques.
Does Dana have access to the SVN folder for the hot pixel correction?
It looks like Steve's definition of "hot" pixels included more than pixels with an intermittent and unreliable response. In his original code, UCoMP_Find_Hot_Pixel.pro
, he lists these criteria:
; The criteria that define a hot pixel are:
; 1. lin_cof = 0 and r_chisq = 0. meaning that linear fit was not attempted due to too few points
; 2. rms of linearity fit exceeds limits
; 3. linear term from fit is outside limits
; 4. dark noise is outside limits
; 5. dark signal is outside limits
Steve later moved to a different approach to identify hot pixels using flat-corrected flats in UCoMP_Find_Hot_Flats.pro
. However, it seems he was interested in removing pixels with anomalous dark current or highly non linear response (criteria 3-5). In principle, pixels with non-linear response can be corrected as long as this non-linear response is stable and not erratic. We do not apply a non-linearity correction to the OWL cameras so it makes sense that Steve wanted to remove highly non-linear pixels.
You cannot identify non-linearity by analyzing only 80 ms flats at one wavelength. This requires to examine different illumination levels.
I will add to the list of questions for Steve what he exactly meant by "hot" pixels. Having clarity on this will help us to define the proper strategy to find them.
This is not the final solution but just a quick test to check if Steve's approach to use a flat-correct flat to find hot pixels works. It does. This needs to be coded properly and run on many flats but it is encouraging.
What I did is the following: I read one raw dark and one raw flat and created a master dark and master flat for the center line continuum (off band) averaging over polarization states and repeats. Then I read in a single flat image, there are 16 in a flat file, and created a flat-corrected flat, i.e. (flat - master dark) / (master flat - master dark)
. This gives a good start image to identify hot pixels. I repeated the search for hot pixels for all 16 flats, combined the hot pixels found, and used them to correct a science image. It seems to work to find most hot pixels. See images below.
Dark and flat-correct science for TCAM without hot pixel correction:
Dark and flat-correct science for TCAM with hot pixel correction:
I noticed our old hot pixels corrections were not perfect while zooming on some features near the occulter. Perhaps more stringent conditions should be used.
I computed hot pixels for camera 0 and camera 1 using Steve's idea as described above for two days 2022 11 25 and 2022 11 24 and the 3 wavelengths position of 1074. I have not used the other lines.
I will send an email with the save files for the hot pixels to Mike. I would like to test them on a day different from the ones used to derive them. For example 2022 11 19.
The agreement is good for camera 0. I find 912 hot pixels in common with Steve. Steve finds 77 pixels that I do not find, but do not seem to be in critical locations at first look. I find about 400 hot pixels more than Steve because I extended the search to a larger fraction of the detector.
Camera 1 flats are noisier than for camera 0 and show more banding. See plots below of a dark-corrected flat.
dark-corrected flat for camera 0
dark-corrected flat for camera1
I do not prescribe the 4 pixels to use to replace hot pixels, like Steve did. Instead I use a 3x3 kernel and compute the value in the 3x3 region excluding other hot pixels. This can be easily implemented adding the IDL lines below:
kernel = replicate(1.0, 3,3)
kernel[1,1]=0
kernel=kernel/total(kernel)
;set hot pixels to zero value in image
data_camerax[hot_camerax]=0
;compute smoothed image, excluding zero values
data_fill = convol(test, kernel, /edge_truncate, /normalize, invalid=0, missing=0)
; replace zeros
data_camerax[hot_camerax]=data_fill[hot_camerax]
The hot pixels I found using days 2022 11 24 and 25 worked well on the strong lines but did not catch all the hot pixels. Some were still visible in the low intensity lines.
I tried to use flats from the low intensity lines but there is not enough S/N and could not use them, so I went back to 1074 and process all days for which we have science data, i.e. 2022 11 19-25. I found a few more more hot pixels but I am not sure that 7 days is sufficient to find all hot pixels. I gave the new files to Mike to test on day 2022 11 19.
I’m reprocessing 20221119 now with the new hot pixel tables. Results will appear in process.intermediate
.
I analyzed the 1079 flats to improve the camera 1 hot pixels and lowered then threshold to add more data. I also added hot pixels to camera 0 because there were some uncorrected. I wil look at that again with more time after the eclipse and when we have more flat to analyze. Camera 1 is much noisier than camera 0.The dark have ~1500 pixels that are above ten times the median value, compared to ~240 in camera 0. I think I need to include these values in the hot pixels files to properly account for artifacts.
I improved the filling method with CONVOL but it was not the problem.
TODO (after the eclipse) remove the /adjacent option and update all .sav hot files.
I analyzed the darks and flats for April 9 2024. The hot pixels are very different for that day and there seems to be more than usual. It is possible the cameras were not at the same temperature. The camera sensor temperature was not recorded in the header.
I created a totally new file for the hot pixel correction: ucomp_hot_high_20240409.sav
This shall be use only for 2024 03 30 and 2024 04 09.
I tried to be conservative in the hot pixel selection to not exclude too many pixels. If some hot pixels remain. I will lower the threshold and include more pixels.
It would be useful if Mike can run 2024 04 09 with this file and create intermediate products for testing.
I created a new hot pixel file to process 2022 November data taken with the new camera: ucomp_hot_high_20221125.sav
It would be useful to run a November day with intermediate products to test it.
Reprocessing 20240409 with 20240409 hot pixels and intermediate products and 20221119 with 20221125 hot pixels and intermediate products. Results are in process.hot-pixels
. 20221119 is done and 20240409 is still being processed.
I updated both IDL save files using the intermediate products. They are in the same location and ready for Mike to test. This should take care of the period after camera 1 was replaced.
The old hot pixel files for the old cameras work very well early in the mission but not later on. Hot pixels start to show approximately on June 10, 2022 and get progressively worse with more and more hot pixels noticeable in the enhanced intensity. It seems more pixels becomes unreliable as the camera ages.
We need to add epochs for the hot pixels.The current files will be used until May 2022. Starting June 1, I we ned new ones.
Unfortunately the code I use for finding the hot pixels does not work on the old camera 1. It finds too many hot pixels, while Steve had only 273 listed for the old camera 1. I am not sure how he found so few hot pixels. I cannot reproduce his results but I get a very good match for camera 0 using the same code. Thus, instead of starting from scratch I will start from his hot pixel files and update them.
To update hot pixel files I have used intermediate products. To do that for the old hot pixel files, I need a series of run with intermediate products from Mike for the period June-November 2022. I only need the "apply.gain" step saved but I need several days. If they take too much space, we can do it in batches. The days I would like are the following:
Results above are going into process.hot-pixels
with the apply_gain
intermediate product. I will check off dates above as they are completed.
The 2022 runs testing the hot pixels are complete.
Mike, I have two test files to try that used more stringent criteria and found a lot fewer hot pixels. They are in the same folder and are called:
ucomp_hot_high_20221125_test.sav
ucomp_hot_high_20240409_test.sav
I would like intermediate steps (just the apply_gain step) for them so I can tell how they work for the two cameras.
Can you please reprocess 20240409 and a few days in 2022 November: 20221120 20221122 20221123 20221124?
Test runs are completed for the given dates, and the results are in process.hot-pixels_20221125_test
and process.hot-pixels_20240409_test
.
The new code to find hot pixels which finds a much smaller number of them is promising but needs more work and I will finalize it for the next milestone.
There are new results for 20240409 in hot-pixels-20240409_test
.
Are these the right hot pixel files to be using at the right times for our current understanding?
[default]
high_hot_pixel_basename : ucomp_hot_high.sav
[20221115]
high_hot_pixel_basename : ucomp_hot_high_20221125.sav
[20240330]
high_hot_pixel_basename : ucomp_hot_high_20240409.sav
Yes. Use those for now
Description: Hot pixels are pixels that have an intermittent behavior with regard signal or noise. They are somewhat pathological. Once identified, they can be interpolated over in the Level1 processing pipeline. Their intermittency makes them difficult to identify. Steve's current method looks at ratios of flat images to identify outliers. It is kludgy.
Approach: Develop a robust hot pixel identification scheme and apply it to lots of data over the duration of the mission.
Questions from Berkley:
Tasks
apply_gain
intermediate product