Open mgalloy opened 7 months ago
How did we handle fitting in comp? Was it always the analytical gauss?
GdT: Yes, the pipeline always did an analytic gaussian fit using the three center wavelength positions. This may not work now if we take positions very closely spaced, because the three center wavelengths are not independent.
At least for the initial testing I think having the fits and chisq saved for each pixel would be helpful.
My naive starting point for the threshold would be picking a chisq (or other metic) value that passes a similar set of pixels as the pipeline currently passes with analytical gauss metic
I had already opened a ticket for this #253. A better way to handle the extra points is not to always make a least square fit if there are more the 3 wavelengths, but have an option if there are more than 3 wavelengths. There may be cases when we may want to revert to the analytic fit.
The new wavelength positions, if there are more then 3, may not be equally spaced, so this is something the pipeline needs to handle.
I am not sure what you mean about the pipeline handling the wavelengths not being equally spaced. The wavelengths will be input as into gaussfit
— they could be anything as long as they are given to gaussfit
.
I have a run in process.latest
for 20240330. There are new extensions in the level 2 files that give parameters of the fit, by pixel:
...
13 Fit coefficients science ----- ----- -----
14 Fit chi-squared science ----- ----- --—-
They are arrays giving the coefficients and chi-squared arrays, by pixel:
IDL> help, coeffs, chisq
COEFFS FLOAT = Array[1280, 1024, 4]
CHISQ FLOAT = Array[1280, 1024]
I like to have an option in the pipeline to do the analytical gaussian fit or the least square fit for the cases with 5 or more wavelengths. I think the analytical gaussian fit assume the wavelengths are equally spaced but the 5 or more wavelength positions may not be equally spaces. We need to put some logic on how to select the proper 3 wavelengths for the analytical gaussian fit in case we have more than 3 points. I think the way to do it is to find the two closest wavelengths to the blue and red used for the waves. This are not necessarily the three center wavelengths.
In other words, we cannot simply choose the center 3 wavelengths for the analytical gaussian fit if we have more than three points.
OK, so the wavelengths only matter if we are using the analytic solution with more than 3 wavelengths? Are the following statements all true?
The LOS velocity, Line width, and Noise mask are 0 (or NaNs) across the whole array in the L2 files in current process.latest run. Was this expected?
Yes, I haven't figured out how to actually calculate anything yet (see questions at the top). But there is chi-squared array for these fits that is interesting to look at in that run.
Mike, the statements above are correct but we need to add another option, i.e. a custom option where we specify the wavelengths to use in the least square gaussian fit. There are some wavelength position we may want to avoid because of contamination from other lines. This is complicated but with such a flexible instrument is to be expected. let's discuss.
@mgalloy I agree chi-squared looks weird. I assumed it would better match the corona.
Remember that peak intensity in that processing is definitely not correct yet.
@mgalloy I agree, but to first order it should show were the corona is.
For the fit masking, I think we can look at the first term of the fit.
If fit[0] > 0, we get a mask that matches reasonably well with the locations with good signals in the peak intensity plots.
Can we pull the LOS and line widths directly from the Gaussian fit results fit[1]
and fit[2]
, with appropriate corrections to convert from nm to km/s?
For 1074.7 LOS, I think this would look like:
LOS = (fit[1] - 1074.7) * 299792.458 / 1074.7
We need to perform the conversion for the line width, the IDL gauss routine reports in stdev, not FWHM, and so on.
line_width = (fit[2] - 1074.7) * 299792.458 / 1074.7 * 2.355
Ben, please let us work on this. I just sent an email. Yes, the conversion will be done.
There are several issues that need to be taken care of to allow the least-square fit to properly work in the automatic pipeline. The strategy Mike and I have agreed on is the following:
The analytical gaussian fit will be performed for all lines. For the lines that have only three points, this will be the end of the processing and nothing will change compared to before. For the lines that have more than three points, we will identify the wavelength closer to the "nominal" ones used in the three points observations and then perform the analytic gaussian fit on those three points. This is because using the center three points does not always work well, especially if we oversample the line and the three center points are very close together.
The analytic gaussian fit will be used as an initial guess for the least square gaussian fit with 4 terms. This is because the least square fit does not always give good fits, especially in pixels with low signal. We found unrealistic gaussian profiles see examples below). To further avoid failing of the least square fitting procedure, we need to consider how to select the points for the fit, e.g. eliminating points below a certain threshold or negative. Also, the chi square is not normalized and results in very low values in the outer for where the intensity is low, even if the fit is poor. Mike will normalize the chi square so we can use it as an indicator of how good the fit really is
There will be a second "custom" option for lines with more than three points, where we specify the wavelengths to use in the fit. This is because there may be observing programs where we want to esclude some wavelength positions that are not reliable.
Finally, since the analytical gaussian fit equation is different from the one used in IDL gaussfit, we need to make sure that all quantities are consistent and we initialize the fit with the proper coefficients. The final products will be the same, regardless of the fitting technique: peak intensity, FWHM, and doppler velocity.
After all this is implemented, we need to verify if the peak intensity, velocity and FWHM look good or if the noise in the outer fov is too large. I worry about noise in the outer fov especially for the weaker lines. We may choose the 3-term fit, if more stable. It will be interesting to compare the peak intensity, velocity, and FWHM for the 3-term and 4-term fit.
Example of fits:
The white is the analytical fit, the blu the least square with 3 terms, and the red the least square with 4 terms. The 4 terms fit should provide a more realistic peak intensity and line width and a better fit in the wings but fails badly in some pixel, as shown above.
Another failed fit.
Two examples where the least square fit worked well.
Doing the analytical "fit" takes about 2-5 seconds for a level 2 file. Doing the actual Gaussian fit takes 7-15 minutes per level 2 file.
Would rewrite this in C speed it up?
The 7-15 minutes are for a fitting all the pixels or only the pixels that pass the velocity criteria?
Criteria need to be revisited for 5-7 points data. I do not think we want to use negative values. The current criteria only apply to the three center points.
There are several things that we can do to speed it up:
ESTIMATES
keyword, which we planned to do anyway to get a better fit, would allow the fit to converge fasterRemember that the waves program is a lot of the files for a day and would not use the fit.
Below is an example when the least square gaussian fit uses the far red and blue points that are too low and find an unrealistic gaussian fits. The blue curve is a fit with 3 terms and the red curve a fit with 4 terms. They are both bad fits to the data. We need to exclude data points with intensity below a threshold (0.1 ppm??).
Below is the same plot excluding the first and last point which are below 0.1ppm. The 3 term fit is now good, the 4 terms fit is bad.
Below is the same plot excluding the first and last point and using the estimate from the analytical gaussian fit to help converging. The 3 term fit is good, the 4 terms is still bad.
Note: In the analytical fit I used the wavelengths closets to the nominal 3-points position and not the 3 center wavelengths.
Conclusions:
Joan and I looked at several profiles. The 4-term least square gaussian fit fails more often than the 3-term bit generally gives a better fit when the lines is not well centered and there are 5-points available.
We want the pipeline to have both options. We will test the 3-term and 4-term fit on two days when the line was well centered and when was not well centered and then decide which fit to use for the periods when UCoMP had large line shifts and we have 5-points.
The least square fit needs a guess or it is more likely to fail (possibly because it finds a secondary minimum). This is the code I use to call gauss fit:
gauss_lq_fit = gaussfit(wave, profile, aa, nterms=3, sigma=sigma, chisq=chisq,$
estimates=[peak_intensity, xpeak, line_width*sqrt(2.0)])
where peak intensity, xpeak, and line_width comes from the analytical Gaussian fit:
comp_analytic_gauss_fit, profile, d_lambda, doppler_shift, line_width, peak_intensity
xpeak
can be computed as:
xpeak = lambda_zero + doppler_shift
where lambda_zero
is the nominal center position.
We must request that all wavelength positions have intensity > 0.1 millions before calling GAUSSFIT
. This condition must be added to the ones already in place for the analytical Gaussian fit.
We can discuss it further if there are questions on how implement this in the pipeline.
How do we calculate lambda_zero is this 1074.7 for our favorite wavelength region?
We do not calculate lambda_zero. Lambda_zero is the nominal center wavelength and is our reference. The analytical gaussian fit returns doppler_shift which is the display in nm of the peak of the fitted gaussian from the nominal center wavelength position.
Because we are not always well centered on the line, the velocities are later renormalized by subtracting the "rest wavelength" computed by averaging the median velocity of the east and west coronal pixels above a certain threshold.
I think I have reasonable runs for the analytic Gaussian "fit", the 3-term fit, and the 4-term fit in:
process.analytic-gaussian
process.3-term
process.4-term
I reprocessed the analytic-gaussian
run to have the same distortion and be comparable to the 3-term
and 4-term
runs.
I processed 20220115 and 20220116 with both analytic, 3-term, and 4-term fits. Results in:
process.analytic-gaussian
process.3-term
process.4-term
Looking at 20220115.191933.ucomp.1074.l2.fts
I find the 3 and 4 term fits have identical LOS pixels values. But both have a ~ 30 km/s offset between the analytical and 3/4 point fits. Is this what we expected?
The noise masking appears to fail for 20220115.191933 fits file. In the analytic-gaussian the noise mask values range from 0-1, instead of a binary good which I expected from the code. And the line width appears null in the 3/4 term fits so the mask is also null for these data.
I looked at a few random 1074 FITS images for the 3 test days and compared peak intensity, velocity and line width. The images in the directories process.3-term, and process.4-term are identical (min and max of the difference image is zero).
The rest wavelength is zero in the 3 and 4 term fits and it seems the difference between the analytic and the least square is basically the rest wavelength, not clear they are real differences beside that one.
This needs more work from MIke to debug the code.
Questions about the current runs:
RSTWVL
computed? why 0.0 so much? or -33?analytic-gaussian
had values between 0 and 1.
20220115.191933
This doesn't add any new information from yesterday's meeting. Three versions of the fits seem identical.
I put together some Python code to look at the LOS value for all the pixels 2380 pixels in a ring at height =300 (pixels) for every 1074 5 point data reprocessed in the 'process.analytic-gaussian', 'process.4-term' processing directories.
After correcting the pixel values by adding the rest wavelength reported in the header. The 3 and 4-term fits are identical. The differences between the analytic and fit are very small, less than .0005 km/s. Most of this difference seems to be from a DC offset (maybe noise in the rest wavelength calculation) that changes per fits file. Within a single file, the differences between the fits are closer to 1e-5 km/s.
@detoma @jburkepile This is just a reminder that on 2022-02-02, we moved the field lens to its correct location (closer to the lyot stop). This should have changed the background (I forget if it did). In today's meeting, we talked about adding some February dates to the Gaussian data checking; it would be interesting to compare the before-and-after field lens moves.
Thanks for checking the change to the field lens location Ben and adding it to the ticket. Looks like you have data before and after the field lens change on Feb 2.
It looks like the stray light begins to diminish on Feb 8 and Feb 9 and by Feb 10, 2022 the data look good. It would be good to have Mike run Gaussian fits (analytic, 3- and 4-parameter) for a sample of days. How about reprocessing: Feb 1, Feb 2, Feb 7, Feb 10, Feb 14, 2022.
Are 5 days sufficient?
So not Jan 15, 16, and 19? or 20240409?
Instead of always doing an analytical Gaussian "fit", the pipeline should do an actual Gaussian fit if there are more than 3 wavelengths in the file.
The code to perform the fit, per pixel, is:
Questions
Tasks
SIGMA
, notCHISQ
.