Use simple averaging selection criteria

mgalloy commented 1 year ago

When creating an average:

cluster available files in program using 30 minute gaps
take the first cluster

Questions

[ ] What is the averaging criteria for waves files?

Tasks

[x] create a config file section for averaging with an option for the 30 minute gap
[x] create a routine to apply averaging criteria to filter a list of files
[x] average all good synoptic images in an 84 minutes window starting at the first good image of the day
[ ] don't create sigma or median files for non-waves programs
[ ] create mean of the synoptic programs, but call it "daily" instead of "mean"
[ ] comment for NUMFILES should be "number of level 1 files averaged"
[ ] maybe #151 could be done while doing the other tasks
[ ] update the L2 Filenames section of the UCoMP manual to reflect these changes for the synoptic filename changes.

mgalloy commented 1 year ago

Created ucomp_l2_average_criteria to filter an array of files on the averaging criteria.

detoma commented 1 year ago

The data may span longer than 30 minutes. We want to avoid gaps longer than 30 minutes. We can make the gap longer, i.e. one hour, if this it is easier to code. The goal is not to average the synoptic program that runs earlier in the day with the one after the waves program.

detoma commented 6 months ago

The 30 minute gap criterium does not work always work. Data were taken differently over time. In Summer-Fall of 2022, there are many days when the first set of two synoptic images and the second set are 36m apart, so we end up averaging only two synoptic images in the average synoptic files. This is not enough. Before we were averaging all synoptic file, which was too much because the images were too far in time.

The way the synoptic data have been taken, it is not ideal for averaging. We need to ignore the gaps and average all good synoptic files over 45m, if we want to average 4 images. If we want to average 6 images, when there are 6, we need to average all good images over 84m. I suggest we go for the 6 images to improve the S/N.

Ideally, for each day, we would like to find the longest set of synoptic images taken, like we did for COMP. I do not know if this is implemented for UCOMP yet. If not, a quick fix is to average all good synoptic images for the first 84m set, i.e. we find the first good image of the day and average all good images taken during the following 84m. This should work reasonably well for most days and seeing is better in the morning.

I would like to implement this before we reprocess the 2021-2022 data to fix the bad frames bug. A more complex averaging scheme that takes into account how data were taken over time can be implemented later. This may require epochs.

bberkeyU commented 6 months ago

@detoma, can we leverage this averaging discussion to develop best practices for future observing programs? For most of 2021/2022, the synoptic program included back-to-back pairs of 4-repeat 16-sum data. The idea was that until we knew the SNR requirement, we could use the pairs of 4repeat data as either two 4-repeat files or, when averaged, one 8-repeat file. I don't think anyone ever used the pairs as 8-repeat files. Then, in the eclipse workshop and eclipse planning, we decided that eight repeats were a bad SNR compromise and that we should seek more wavelength diversity instead of sitting on a single line for eight repeats.

If we could retake the 2021/2022 data, what would we do for the synoptic program? Just change our repeats per line from 8 to 1,2,4, or 6 repeats per line? Do all lines have the same number of repeats? Should we cluster our day into 45 or maybe 84-minute windows where we focus on a subset of our lines before moving on to the next subset?

detoma commented 6 months ago

This a priority for when we reopen and is a different discussion.

There is not one size fits all. It depends on the science question we want to answer and what the Sun looks like that day. We have a small telescope and more wavelengths means lower cadence at one specific line.

One thing that was not helpful and we will change it going forward for sure is to take twice the same wavelength during the synoptic program. It is better to space them apart so we get a better coverage of dynamic events. You can still average them together during quiet times.

bberkeyU commented 6 months ago

@detoma Why wasn't the 8 repeat data set broken into two files helpful? Was it breaking it up into two files? Or was it 8 repeats too many?

detoma commented 6 months ago

Today we agreed on a few steps to move forward with this issue:

1) Use a simple averaging criteria for the synoptic data with a more complex one to be implemented later. We will average all good images in an 84 minutes window starting at the first good image of. the day. Some days we may get a better average by averaging data taking later in the day, but this simple criteria will cover most days.

2) We will only compute a mean synoptic file for the synoptic program. We do not have enough data to make a meaningful median synoptic file.

3) There will be no minimum number of images to make a mean file. If there is only one good image, we will still make a "mean" file because is needed for the webpage. Question for Mike and Don: Is a .png sufficient for the web? It does not make sense to create a FITS called mean that has only one image. I think we need at least two images to justify a .FITS mean file.

detoma commented 6 months ago

We agreed to stop making for the synoptic average files and create instead a daily synoptic image (see tasks above).

mgalloy commented 5 months ago

What is the averaging criteria for waves files? Is it the entire program? the 84 minute window? something else?

detoma commented 5 months ago

I think in the past we used the entire waves program.

Is the waves program always about an hour or are there days when it is a lot longer? If we have longer programs and just to be safe, I would put a similar limit of 80m from the first to the last image used so we do not average over a period too long. Can this be easily implemented?

mgalloy commented 5 months ago

Previously, there was a "too long gap" criteria of 30 minutes for all averages. It is not hard to make program-based criteria (well, as least for waves and synoptic) for averages.

NCAR / ucomp-pipeline

Use simple averaging selection criteria #208

Questions

Tasks