fitACF 3.0 Caution - Githubissues

sshepherd commented 5 years ago

Recent difficulties with the Christmas Valley radars reveal a potential issue with using fitACF 3.0. It was suggested to me that I blacklist some of the data files from CVW and CVE during a period when they were experiencing elevated noise levels. Below are RTI plots using v2.5 and v3.0, respectively.

Both plots show a high level of noise but v3.0 shows significantly more "data" values, which led to the blacklist request. The v2.5 plot is noisy but not nearly as bad as v3.0.

Scan plots for v2.5 and 3.0:

The real problem (which affected our realtime convection maps and anyone making grid files with v3.0) is the data when using standard gridding:

I should also say that processing these files with v3.0 took 2-4 times longer than when using v.2.5. I suspect that has to do with the increased amount of "data" or noisy values that are created.

pasha-ponomarenko commented 5 years ago

This has been noticed and discussed before (https://github.com/SuperDARN/rst/pull/74): the old FITCAF seems to "sweep under the carpet" problems with interference (i.e. non-Gaussian noise) while FITACF3 flags them clearly. I believe, this was mentioned at one of our FITACF3 testing presentations in France.

The processing time is inherently longer due to different ways of calculating power/phase variance. This is the price for adequate fitting and realistic error estimates.

sshepherd commented 5 years ago

Okay, but my cautionary point is that as a naive user v3.0 let’s through noise in the gridding process. I anticipate you will suggest that the gridding process should be modified, but it currently isn’t, and I suspect those bogus vectors are likely in grid files that are being used.

On Sat, Jan 12, 2019 at 2:01 PM pasha-ponomarenko notifications@github.com wrote:

This has been noticed and discussed before (#74 https://github.com/SuperDARN/rst/pull/74): the old FITCAF seems to "sweep under the carpet" problems with interference (i.e. non-Gaussian noise) while FITACF3 flags them clearly. I believe, this was mentioned at one of our FITACF3 testing presentations in France.

The processing time is inherently longer due to different ways of calculating power/phase variance. This is the price for adequate fitting and realistic error estimates.

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/208#issuecomment-453772762, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DPyVS5fzho9DIsptZQOCxXbDHtpkDks5vCjEPgaJpZM4Z8q_p .

pasha-ponomarenko commented 5 years ago

Sure! One should keep in mind that at the fitting stage only preliminary noise filtering can take place, so that there should be adequate downstream procedures taking care of whatever noise/interference snicks in. An obvious way of detecting noise is to analyse range-time neigbours, and this can only be done after fitting is performed. To certain extent this second-stage filtering has been implemented with the 3x3x3 median filter wich can be optionally augmented by SNR, velocity and width thresholds, but in no way it is perfect and requires a thorough revision. Optimally, relaistic velocity error estimates produced by FITACF3 should be used for this purpose, and not through applying thresholds but via weighting out during averaging and/or fitting.

egthomas commented 5 years ago

What do the velocity errors look like for this particular day?

sshepherd commented 5 years ago

Not sure Evan. Was just processing files with defaults.

On Sat, Jan 12, 2019 at 5:32 PM Evan Thomas notifications@github.com wrote:

What do the velocity errors look like for this particular day?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/208#issuecomment-453785957, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DP2aLIviNaoqv75MMcxDoJQ-Ha_nqks5vCmKVgaJpZM4Z8q_p .

egthomas commented 5 years ago

Out of morbid curiosity, I modified time_plot to allow velocity error as an output parameter; below are the results for this particular day at CVE on beam 1 for FITACF 2.5 (top) and 3.0 (bottom):

rti_25

rti_30

egthomas commented 5 years ago

And likewise for field_plot:

fan_25

fan_30

sshepherd commented 5 years ago

It is not clear to me from these plots that filtering the v3.0 data by the velocity error would prevent them from contaminating grid-level data.

On Sun, Jan 13, 2019 at 9:58 AM Evan Thomas notifications@github.com wrote:

Out of morbid curiosity, I modified time_plot to allow velocity error as an output parameter; below are the results for this particular day at CVE on beam 1 for FITACF 2.5 (top) and 3.0 (bottom):

[image: rti_25] https://user-images.githubusercontent.com/1869073/51086865-9c92d300-1719-11e9-8ef1-6572612bcf6d.png

[image: rti_30] https://user-images.githubusercontent.com/1869073/51086870-a1f01d80-1719-11e9-9474-fe449ac54850.png

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/208#issuecomment-453836551, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DP5JPUxm4G54HcsWWOmxvHloqM1tSks5vC0mngaJpZM4Z8q_p .

pasha-ponomarenko commented 5 years ago

You have provisionally blacklisted these data, so I see no much of a problem in this particular case. More generally, as follows from https://github.com/SuperDARN/rst/pull/74, for an anthropogenic noise, as seems to be the case here, the noise characteristics seems to be site-specific and tackling them would require different filtering approaches.

By the way, the fan plots show some directional inhomogeneity so the noise source seems to be directional... Does it make sense?

sshepherd commented 5 years ago

I have NOT blacklisted these data. The noise level is high, but using fitacf v2.5 there is still usable data, but mostly ground scatter. I have not looked into these data in much detail other than what I have mentioned. The noise source was likely something internal.

On Mon, Jan 14, 2019 at 2:25 PM pasha-ponomarenko notifications@github.com wrote:

You have provisionally blacklisted these data, so I see no much of a problem in this particular case. More generally, as follows from #74 https://github.com/SuperDARN/rst/pull/74, for an anthropogenic noise, as seems to be the case here, the noise characteristics seems to be site-specific and tackling them would require different filtering approaches.

By the way, the fan plots show some directional inhomogeneity so the noise source seems to be directional... Does it make sense?

— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/208#issuecomment-454129601, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DP-vuY5wFIgGPGLQtOB7Jzy0aCQFaks5vDNmggaJpZM4Z8q_p .

asreimer commented 5 years ago

Has anyone attempted filtering the fitacf3 data by using both fitted parameter errors and reduced chi-squared? It is possible for fitted parameter errors to be small but for the fit to be meaningless. In this case, the reduced chi-squared will differ significantly from 1, indicating that the model doesn't not accurately represent the data.

Alternatively, the fit might actually be completely valid and we just might not like the results! For example, it is possible for the ACF model to fit noisey data quite well with a fitted velocity near the Nyquist limits and large spectral width. As such, one would expect the fitting of a noise ACF to actually work out pretty well and such a fit might even have a reduced chi-squared close to 1. Filtering based on Nyquist limits and large spectral width might also help resolve this issue, especially since it appears that most of the "noise" is associated with a very large velocity.

Given the above, this cautionary tale that @sshepherd has identified might be demonstrating that fitacf3 is doing a much better job of fitting the ACFs than fitacf2.5, because it actually fits the noise ACFs with resulting fitted parameters indicative of a noise ACF! This would mean that the cautionary tale is about how the community wants to filter out data points that are fitted noise ACFs.

To start that discussion, I want to point out that the SuperDARN community is completely unique in publishing a single fitted file format that is expected to be the gold standard format for which users can just use the data as is. I would strongly suggest that the community starts to changes it's thinking towards more of a "data levels" style model (something like this: https://science.nasa.gov/earth-science/earth-science-data/data-processing-levels-for-eosdis-data-products) where a higher level data product is derived from the "fitacf" files and includes things like filtering, geolocation information, FOV, etc. Several people (me, @aburrell, Dr. McWilliams) have identified this as a problem over the years, but I'm not sure it's ever been voiced in the DAWG before? One can see how an intermediate "vetted" file could be derived from a "fitacf" file that includes whatever subjective filtering the community has decided on for usage with convection maps, instead of baking that subjectivity into the fitting procedures themselves. This allows different researchers to choose what filtering is appropriate for their own research, or to use the community standard filtering.

I agree with @sshepherd in that the current DA procedures don't handle "non-compliant" data very well, so you have to be careful about using new stuff with the old mapping software. I think that the take-away from this is that things need to be updated/changed. So I think the most constructive discussion to be having here is centered around: what should be done going forward? Use fitacf2.5, which doesn't fit the data properly, or make changes to allow improved fitting algorithms to be used?

If this is a discussion people want to have, I could open a new issue for it?

ecbland commented 5 years ago

I'm tempted to close this issue because it's not really possible to resolve it with our current understanding of SuperDARN noise sources.

ecbland commented 3 years ago

This issue came up again in some PI discussions about FITACF3.0 after the SD Workshop, so I'm reopening it temporarily. No specific action is needed, but I'm putting it here so that anyone who is interested can easily follow along or comment. The PIs asked the following:

Here is a link to an issue that Simon raised a long time ago about noise and FitACF3 (https://github.com/SuperDARN/rst/issues/208). There was some good discussion but ultimately the issue was closed “because it's not really possible to resolve it with our current understanding of SuperDARN noise sources.” Perhaps our understanding has improved. We would like to see how the despeckling and/or other new features can handle this specific case.

_Note: There are no "other new features" relevant here, so we will just look at the despeckling (fit_speck_removal) routine._

Here's 2 hours of CVE data processed with FITACF2.5 (left) and FITACF3.0 (right). The FITACF2.5 data appear less "noisy" because most of the noise ACFs are rejected by the ad-hoc criteria for cross-range interference and ACF shape in v2.5. We know that these criteria can significantly overfilter the data (but not in this particular case), which is why they were relaxed in v3.0.

It's easy to test how fit_speck_removal handles the Christmas Valley data from this time period. The figures below show the FITACF3.0 data before and after filtering

fit_speck_removal 20181120.cve.fitacf3 > cve.filter.fitacf3
time_plot -png -v -vmin -1000 -vmax 1000 -b 8 -st 04:00 -ex 02:00  cve.filter.fitacf3 > cve.filter.png

Despeckling removes some of the noise data, but a lot remains. Let's try applying the filter multiple times:

fit_speck_removal cve.filter.fitacf3 > cve.filter2.fitacf3
fit_speck_removal cve.filter2.fitacf3 > cve.filter3.fitacf3
fit_speck_removal cve.filter3.fitacf3 > cve.filter4.fitacf3

We see some small improvements, but even after 4 rounds of filtering we do not obtain the "clean" dataset we were hoping for. It's important to remember that fit_speck_removal is designed to reject data that are isolated in range-time space (<4 "neighbours"). With successive filtering we see that the remaining "noise" echoes are grouped together in range-time space, and could perhaps be misinterpreted as coherent scatter. Not recommended!

Let's take a closer look at why the fitted data from CVE/CVW from November 2018 appear so noisy. The problem arises from the unusually low number of averages in the integration period (nave). As shown in the plot below, nave decreases gradually over a 2-week period from the nominal value of >20 to just 2 by the end of November. nave returns to its nominal range late on 30/11/2018 after the radar was presumably restarted (see data gap at ~23UT in the second figure below). I suspect this behaviour was caused by a software issue (data buffer?), but perhaps someone familiar with the Christmas Valley system would be able to offer further insight.

To understand why the number of noise ACFs in the fitted data increases as nave decreases, let's look at the noise power distributions at selected times in November 2018. In the next figure, the probability density functions (PDFs) are histograms of pwr0/skynoise. I then divided by the maximum value for easy comparison. Up to about 15/11/2018, the PDF is approximately Gaussian. After that, the PDF develops a very large high-power "tail" as nave decreases further. The pwr0 values within this high-power tail meet the signal-to-noise ratio criterion used for data pre-selection in FITACF (SNR>1, vertical dashed line in the figure). Recall that FITACF2.5 rejects most of the noise ACFs in this high-power tail due to its ad-hoc criteria for CRI and ACF shape.

In my opinion, these data should be blocklisted. They are not compatible with the basic assumption of a Gaussian noise distribution used in all versions FITACF for data pre-selection. Put another way, the data don't really comply with the "integration time" requirement of the SuperDARN common mode -- while the integration time written in the data files is 3.0s, the actual data were collected over a period of just ~0.2-1s.

asreimer commented 3 years ago

Hey @ecbland,

This is an excellent analysis. I had some additional thoughts that supports your analysis here:

voltage samples from SuperDARN signals are correlated zero-mean Gaussian distributed quantities (I show this to be true in Reimer et al. 2016 and this is true for most volume scatter radars because of the central limit theorem),
the same is assumed true for noise sources, except they are assumed to be uncorrelated,
the sum of the squares of independent zero-mean Gaussian variables (average power is this divided by number of samples) are in general chi-squared distributed, but only approaching Gaussian when the number of voltage samples is "large"

This means that what you say about the number of samples not being sufficient with 0.2-1 s integrations is also backed up by the fundamental statistical properties of the noise. And if the noise is non-stationary or correlated, that also violates the assumptions of the processing.

If we had the voltage samples for this data, we could probably do a lot better job of characterizing the noise. Yet another reason SuperDARN should start considering saving the voltage data (rawacfs are only slightly smaller than the IQ).

pasha-ponomarenko commented 3 years ago

@asreimer, a small correction: PDF of lag 0 power values for noise ACFs is always Chi^2 which with growing number of averages (pulse sequences going into a single ACF) approaches Gaussian shape. :-)

asreimer commented 3 years ago

Thanks. That's what I meant. 80 hour week and trying to catch up with emails.

ksterne commented 3 years ago

I think you've presented some good analysis of the problem here as well. Though it's out of scope for this WG to remove these files from distribution. The analysis here is a start for why the files should be removed. cve is tied to cvw for operating software, so it would be good to see the nave and noise analysis for cvw as well.

egthomas commented 3 years ago

If only there were some kind of working group or task force dedicated specifically to data quality control...

If we had the voltage samples for this data, we could probably do a lot better job of characterizing the noise. Yet another reason SuperDARN should start considering saving the voltage data (rawacfs are only slightly smaller than the IQ).

All of the IQdat files from the Christmas Valley radars are stored locally at Dartmouth and, for specific time intervals, could likely be made available upon request.

ecbland commented 3 years ago

@ksterne It's easy to see from the quicklook plots on the VT website that nave behaves the same for CVW. 30th November is the last date with the problem and also where the problem is clearest for both radars (nave=2-3).

Since this issue is not related to the analysis software, I'll leave it to the PI group to investigate it further if they want to.

pasha-ponomarenko commented 3 years ago

Kevin, the same problem remains for CVV as well.

Cheers, Pasha

From: Kevin Sterne @.> Sent: Monday, July 26, 2021 8:16:52 AM To: SuperDARN/rst @.> Cc: Ponomarenko, Pasha @.>; Comment @.> Subject: Re: [SuperDARN/rst] fitACF 3.0 Caution (#208)

CAUTION: External to USask. Verify sender and use caution with links and attachments. Forward suspicious emails to @.***

I think you've presented some good analysis of the problem here as well. Though it's out of scope for this WG to remove these files from distribution. The analysis here is a start for why the files should be removed. cve is tied to cvw for operating software, so it would be good to see the nave and noise analysis for cvw as well.

— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/SuperDARN/rst/issues/208#issuecomment-886743398, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABIRTSF52EGBTTJUKLZHXHLTZVU5JANCNFSM4GPSV7UQ.

asreimer commented 3 years ago

All of the IQdat files from the Christmas Valley radars are stored locally at Dartmouth and, for specific time intervals, could likely be made available upon request.

Awesome! Hopefully more groups start doing this regularly too. That's actually crucial information for any of the working groups to know. It should be easy to show what the statistical properties of the noise is using that data.

(For example, how does it change as you manually increase nave, etc)

SuperDARN / rst

fitACF 3.0 Caution #208