Closed sshepherd closed 2 years ago
This has been noticed and discussed before (https://github.com/SuperDARN/rst/pull/74): the old FITCAF seems to "sweep under the carpet" problems with interference (i.e. non-Gaussian noise) while FITACF3 flags them clearly. I believe, this was mentioned at one of our FITACF3 testing presentations in France.
The processing time is inherently longer due to different ways of calculating power/phase variance. This is the price for adequate fitting and realistic error estimates.
Okay, but my cautionary point is that as a naive user v3.0 let’s through noise in the gridding process. I anticipate you will suggest that the gridding process should be modified, but it currently isn’t, and I suspect those bogus vectors are likely in grid files that are being used.
On Sat, Jan 12, 2019 at 2:01 PM pasha-ponomarenko notifications@github.com wrote:
This has been noticed and discussed before (#74 https://github.com/SuperDARN/rst/pull/74): the old FITCAF seems to "sweep under the carpet" problems with interference (i.e. non-Gaussian noise) while FITACF3 flags them clearly. I believe, this was mentioned at one of our FITACF3 testing presentations in France.
The processing time is inherently longer due to different ways of calculating power/phase variance. This is the price for adequate fitting and realistic error estimates.
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/208#issuecomment-453772762, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DPyVS5fzho9DIsptZQOCxXbDHtpkDks5vCjEPgaJpZM4Z8q_p .
Sure! One should keep in mind that at the fitting stage only preliminary noise filtering can take place, so that there should be adequate downstream procedures taking care of whatever noise/interference snicks in. An obvious way of detecting noise is to analyse range-time neigbours, and this can only be done after fitting is performed. To certain extent this second-stage filtering has been implemented with the 3x3x3 median filter wich can be optionally augmented by SNR, velocity and width thresholds, but in no way it is perfect and requires a thorough revision. Optimally, relaistic velocity error estimates produced by FITACF3 should be used for this purpose, and not through applying thresholds but via weighting out during averaging and/or fitting.
What do the velocity errors look like for this particular day?
Not sure Evan. Was just processing files with defaults.
On Sat, Jan 12, 2019 at 5:32 PM Evan Thomas notifications@github.com wrote:
What do the velocity errors look like for this particular day?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/208#issuecomment-453785957, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DP2aLIviNaoqv75MMcxDoJQ-Ha_nqks5vCmKVgaJpZM4Z8q_p .
Out of morbid curiosity, I modified time_plot
to allow velocity error as an output parameter; below are the results for this particular day at CVE on beam 1 for FITACF 2.5 (top) and 3.0 (bottom):
And likewise for field_plot
:
It is not clear to me from these plots that filtering the v3.0 data by the velocity error would prevent them from contaminating grid-level data.
On Sun, Jan 13, 2019 at 9:58 AM Evan Thomas notifications@github.com wrote:
Out of morbid curiosity, I modified time_plot to allow velocity error as an output parameter; below are the results for this particular day at CVE on beam 1 for FITACF 2.5 (top) and 3.0 (bottom):
[image: rti_25] https://user-images.githubusercontent.com/1869073/51086865-9c92d300-1719-11e9-8ef1-6572612bcf6d.png
[image: rti_30] https://user-images.githubusercontent.com/1869073/51086870-a1f01d80-1719-11e9-9474-fe449ac54850.png
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/208#issuecomment-453836551, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DP5JPUxm4G54HcsWWOmxvHloqM1tSks5vC0mngaJpZM4Z8q_p .
You have provisionally blacklisted these data, so I see no much of a problem in this particular case. More generally, as follows from https://github.com/SuperDARN/rst/pull/74, for an anthropogenic noise, as seems to be the case here, the noise characteristics seems to be site-specific and tackling them would require different filtering approaches.
By the way, the fan plots show some directional inhomogeneity so the noise source seems to be directional... Does it make sense?
I have NOT blacklisted these data. The noise level is high, but using fitacf v2.5 there is still usable data, but mostly ground scatter. I have not looked into these data in much detail other than what I have mentioned. The noise source was likely something internal.
On Mon, Jan 14, 2019 at 2:25 PM pasha-ponomarenko notifications@github.com wrote:
You have provisionally blacklisted these data, so I see no much of a problem in this particular case. More generally, as follows from #74 https://github.com/SuperDARN/rst/pull/74, for an anthropogenic noise, as seems to be the case here, the noise characteristics seems to be site-specific and tackling them would require different filtering approaches.
By the way, the fan plots show some directional inhomogeneity so the noise source seems to be directional... Does it make sense?
— You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub https://github.com/SuperDARN/rst/issues/208#issuecomment-454129601, or mute the thread https://github.com/notifications/unsubscribe-auth/AC2DP-vuY5wFIgGPGLQtOB7Jzy0aCQFaks5vDNmggaJpZM4Z8q_p .
Has anyone attempted filtering the fitacf3 data by using both fitted parameter errors and reduced chi-squared? It is possible for fitted parameter errors to be small but for the fit to be meaningless. In this case, the reduced chi-squared will differ significantly from 1, indicating that the model doesn't not accurately represent the data.
Alternatively, the fit might actually be completely valid and we just might not like the results! For example, it is possible for the ACF model to fit noisey data quite well with a fitted velocity near the Nyquist limits and large spectral width. As such, one would expect the fitting of a noise ACF to actually work out pretty well and such a fit might even have a reduced chi-squared close to 1. Filtering based on Nyquist limits and large spectral width might also help resolve this issue, especially since it appears that most of the "noise" is associated with a very large velocity.
Given the above, this cautionary tale that @sshepherd has identified might be demonstrating that fitacf3 is doing a much better job of fitting the ACFs than fitacf2.5, because it actually fits the noise ACFs with resulting fitted parameters indicative of a noise ACF! This would mean that the cautionary tale is about how the community wants to filter out data points that are fitted noise ACFs.
To start that discussion, I want to point out that the SuperDARN community is completely unique in publishing a single fitted file format that is expected to be the gold standard format for which users can just use the data as is. I would strongly suggest that the community starts to changes it's thinking towards more of a "data levels" style model (something like this: https://science.nasa.gov/earth-science/earth-science-data/data-processing-levels-for-eosdis-data-products) where a higher level data product is derived from the "fitacf" files and includes things like filtering, geolocation information, FOV, etc. Several people (me, @aburrell, Dr. McWilliams) have identified this as a problem over the years, but I'm not sure it's ever been voiced in the DAWG before? One can see how an intermediate "vetted" file could be derived from a "fitacf" file that includes whatever subjective filtering the community has decided on for usage with convection maps, instead of baking that subjectivity into the fitting procedures themselves. This allows different researchers to choose what filtering is appropriate for their own research, or to use the community standard filtering.
I agree with @sshepherd in that the current DA procedures don't handle "non-compliant" data very well, so you have to be careful about using new stuff with the old mapping software. I think that the take-away from this is that things need to be updated/changed. So I think the most constructive discussion to be having here is centered around: what should be done going forward? Use fitacf2.5, which doesn't fit the data properly, or make changes to allow improved fitting algorithms to be used?
If this is a discussion people want to have, I could open a new issue for it?
I'm tempted to close this issue because it's not really possible to resolve it with our current understanding of SuperDARN noise sources.
This issue came up again in some PI discussions about FITACF3.0 after the SD Workshop, so I'm reopening it temporarily. No specific action is needed, but I'm putting it here so that anyone who is interested can easily follow along or comment. The PIs asked the following:
Here is a link to an issue that Simon raised a long time ago about noise and FitACF3 (https://github.com/SuperDARN/rst/issues/208). There was some good discussion but ultimately the issue was closed “because it's not really possible to resolve it with our current understanding of SuperDARN noise sources.” Perhaps our understanding has improved. We would like to see how the despeckling and/or other new features can handle this specific case.
_Note: There are no "other new features" relevant here, so we will just look at the despeckling (fit_speck_removal
) routine._
Here's 2 hours of CVE data processed with FITACF2.5 (left) and FITACF3.0 (right). The FITACF2.5 data appear less "noisy" because most of the noise ACFs are rejected by the ad-hoc criteria for cross-range interference and ACF shape in v2.5. We know that these criteria can significantly overfilter the data (but not in this particular case), which is why they were relaxed in v3.0.
It's easy to test how fit_speck_removal
handles the Christmas Valley data from this time period. The figures below show the FITACF3.0 data before and after filtering
fit_speck_removal 20181120.cve.fitacf3 > cve.filter.fitacf3
time_plot -png -v -vmin -1000 -vmax 1000 -b 8 -st 04:00 -ex 02:00 cve.filter.fitacf3 > cve.filter.png
Despeckling removes some of the noise data, but a lot remains. Let's try applying the filter multiple times:
fit_speck_removal cve.filter.fitacf3 > cve.filter2.fitacf3
fit_speck_removal cve.filter2.fitacf3 > cve.filter3.fitacf3
fit_speck_removal cve.filter3.fitacf3 > cve.filter4.fitacf3
We see some small improvements, but even after 4 rounds of filtering we do not obtain the "clean" dataset we were hoping for. It's important to remember that fit_speck_removal
is designed to reject data that are isolated in range-time space (<4 "neighbours"). With successive filtering we see that the remaining "noise" echoes are grouped together in range-time space, and could perhaps be misinterpreted as coherent scatter. Not recommended!
Let's take a closer look at why the fitted data from CVE/CVW from November 2018 appear so noisy. The problem arises from the unusually low number of averages in the integration period (nave
). As shown in the plot below, nave
decreases gradually over a 2-week period from the nominal value of >20 to just 2 by the end of November. nave
returns to its nominal range late on 30/11/2018 after the radar was presumably restarted (see data gap at ~23UT in the second figure below). I suspect this behaviour was caused by a software issue (data buffer?), but perhaps someone familiar with the Christmas Valley system would be able to offer further insight.
To understand why the number of noise ACFs in the fitted data increases as nave
decreases, let's look at the noise power distributions at selected times in November 2018. In the next figure, the probability density functions (PDFs) are histograms of pwr0
/skynoise
. I then divided by the maximum value for easy comparison. Up to about 15/11/2018, the PDF is approximately Gaussian. After that, the PDF develops a very large high-power "tail" as nave
decreases further. The pwr0
values within this high-power tail meet the signal-to-noise ratio criterion used for data pre-selection in FITACF (SNR>1, vertical dashed line in the figure). Recall that FITACF2.5 rejects most of the noise ACFs in this high-power tail due to its ad-hoc criteria for CRI and ACF shape.
In my opinion, these data should be blocklisted. They are not compatible with the basic assumption of a Gaussian noise distribution used in all versions FITACF for data pre-selection. Put another way, the data don't really comply with the "integration time" requirement of the SuperDARN common mode -- while the integration time written in the data files is 3.0s, the actual data were collected over a period of just ~0.2-1s.
Hey @ecbland,
This is an excellent analysis. I had some additional thoughts that supports your analysis here:
This means that what you say about the number of samples not being sufficient with 0.2-1 s integrations is also backed up by the fundamental statistical properties of the noise. And if the noise is non-stationary or correlated, that also violates the assumptions of the processing.
If we had the voltage samples for this data, we could probably do a lot better job of characterizing the noise. Yet another reason SuperDARN should start considering saving the voltage data (rawacfs are only slightly smaller than the IQ).
@asreimer, a small correction: PDF of lag 0 power values for noise ACFs is always Chi^2 which with growing number of averages (pulse sequences going into a single ACF) approaches Gaussian shape. :-)
Thanks. That's what I meant. 80 hour week and trying to catch up with emails.
I think you've presented some good analysis of the problem here as well. Though it's out of scope for this WG to remove these files from distribution. The analysis here is a start for why the files should be removed. cve is tied to cvw for operating software, so it would be good to see the nave
and noise analysis for cvw as well.
If only there were some kind of working group or task force dedicated specifically to data quality control...
If we had the voltage samples for this data, we could probably do a lot better job of characterizing the noise. Yet another reason SuperDARN should start considering saving the voltage data (rawacfs are only slightly smaller than the IQ).
All of the IQdat files from the Christmas Valley radars are stored locally at Dartmouth and, for specific time intervals, could likely be made available upon request.
@ksterne It's easy to see from the quicklook plots on the VT website that nave behaves the same for CVW. 30th November is the last date with the problem and also where the problem is clearest for both radars (nave=2-3).
Since this issue is not related to the analysis software, I'll leave it to the PI group to investigate it further if they want to.
Kevin, the same problem remains for CVV as well.
Cheers, Pasha
From: Kevin Sterne @.> Sent: Monday, July 26, 2021 8:16:52 AM To: SuperDARN/rst @.> Cc: Ponomarenko, Pasha @.>; Comment @.> Subject: Re: [SuperDARN/rst] fitACF 3.0 Caution (#208)
CAUTION: External to USask. Verify sender and use caution with links and attachments. Forward suspicious emails to @.***
I think you've presented some good analysis of the problem here as well. Though it's out of scope for this WG to remove these files from distribution. The analysis here is a start for why the files should be removed. cve is tied to cvw for operating software, so it would be good to see the nave and noise analysis for cvw as well.
— You are receiving this because you commented. Reply to this email directly, view it on GitHubhttps://github.com/SuperDARN/rst/issues/208#issuecomment-886743398, or unsubscribehttps://github.com/notifications/unsubscribe-auth/ABIRTSF52EGBTTJUKLZHXHLTZVU5JANCNFSM4GPSV7UQ.
All of the IQdat files from the Christmas Valley radars are stored locally at Dartmouth and, for specific time intervals, could likely be made available upon request.
Awesome! Hopefully more groups start doing this regularly too. That's actually crucial information for any of the working groups to know. It should be easy to show what the statistical properties of the noise is using that data.
(For example, how does it change as you manually increase nave, etc)
Recent difficulties with the Christmas Valley radars reveal a potential issue with using fitACF 3.0. It was suggested to me that I blacklist some of the data files from CVW and CVE during a period when they were experiencing elevated noise levels. Below are RTI plots using v2.5 and v3.0, respectively.
Both plots show a high level of noise but v3.0 shows significantly more "data" values, which led to the blacklist request. The v2.5 plot is noisy but not nearly as bad as v3.0.
Scan plots for v2.5 and 3.0:
The real problem (which affected our realtime convection maps and anyone making grid files with v3.0) is the data when using standard gridding:
I should also say that processing these files with v3.0 took 2-4 times longer than when using v.2.5. I suspect that has to do with the increased amount of "data" or noisy values that are created.