Implement relative magnitude bias correction as in Schaff & Richards 2014

flixha commented 3 years ago

What does this PR do?

For utils.mag_calc.relative_magnitude, this PR implements equation 10 of Schaff & Richards 2014. This corrects the bias in the relative magnitude due to cc < 1 and SNR < ∞.

Why was it initiated? Any relevant Issues?

See #455. I opened that issue because I couldn't seem to get reasonable magnitudes estimated with the current implementation of relative_magnitude, as demonstrated in the figure below:

Figure 1: Comparison of local and relative magnitude estimations for different equations. Left: previous implementation in EQcorrscan. Middle: relative magnitudes standard L2-norm, i.e., no bias-correction for CC or SNR. Right: implementation in this PR according to equation 10 in Schaff & Richards 2014. Note how the relation on the left seems to be quite wrong, while for the others the trend is alright. Low precision is due to I/O from Nordic format.

Compared to the previous implementation, this PR implements equation 10 rather than equation 4 from the paper. Now that I got to read the paper thoroughly, how I understand it is that this is the equation that the authors actually use to determine corrected delta-magnitudes. That equation includes correcting the bias that CC<1 and noise introduce. I'm not exactly sure when equation 4 produces reasonable results, but I think that would be limited to some rather specific cases (see how it doesn't work in the figure above).

To avoid confusion, I renamed the parameter weight_by_correlation to correct_mag_bias.
relative_amplitude now returns dicts of SNRs to which equation 10 needs access
need to check how the commented equation renders in the docs...
running the function with weight_by_correlation=False returns the standard L2-norm relative magnitudes from the figure in the middle
Let me know if we can get a compatible understanding of the paper / equation to use :-)

PR Checklist

[ ] develop base branch selected?
[X] This PR is not directly related to an existing issue (which has no PR yet).
[X] All tests still pass.
[X] Any new features or fixed regressions are be covered via new tests.
[X] Any new or changed features have are fully documented.
[X] Significant changes have been added to CHANGES.md.
~~[ ] First time contributors have added your name to CONTRIBUTORS.md.~~

calum-chamberlain commented 3 years ago

Hey Felix, thanks for taking the time to do this! I agree that the weight_by_correlation version (equation 4) doesn't seem like it is likely to give reliable (or meaningful) results in most circumstances. I was using it specifically for repeating earthquakes, with very high correlations (cc >.9 on all channels and very similar amplitudes), so the impact was pretty minimal.

I also agree that, on further reading, it looks like the authors didn't make use of equation 4, except on their way to later equations.

You might want to check that SNR is calculated in the same way as Schaff and Richards do, and ensure that if SNR drops to 1, then a warning is given and/or the relative magnitude is not returned (which probably should have been in there anyway to avoid spurious results)

calum-chamberlain commented 3 years ago

Oh! some of the tests actually completed!

By the way, if you want us to make a new release with this in it next week I would be keen to get it out - it isn't that much work to push out a new release, and I think it would be good to get this correction out into user-land (and remove the old, not particularly useful version!).

flixha commented 3 years ago

Yeah, sometimes in a lucky moment the web services return the data as expected!

Thanks for the comment on the SNR - you're right, this was good to check again. The paper computes the SNR from the ratio of L2-norms of signal and noise. Compared to the original SNR that is commonly used in EQcorrscan ( max(singal)/rms(noise) ), the L2-norm SNR has a smaller value, commonly smaller by a factor of 3-10.

In utils.mag_calc._get_signal_and_noise, I now implemented the L2-SNR so that both relative_amplitude and relative_magnitude work with the SNR used in the equation. Here some thoughts:

Using different defintions of SNR in the code may be a bit confusing (e.g., template_gen uses original SNR)
The amplitude/magnitude functions are not yet called from anywhere else in EQcorrscan, hence changing it should not be a problem here (of course needs to be clearly documented).
With the implementation in here, it's ok to generates templates and from these computes relative magnitudes using the same value for SNR, because: the L2-norm SNR in relative_magnitude will be stricter and exclude some traces that passed the first quality check with the original SNR.

Let me know whether it's ok for you to replace current functionality as in the points above, or whether you prefer new functions in addition to the old ones!:

flixha commented 3 years ago

Here the updated figure to demonstrate the magnitudes computed with L2-norm SNR (really this doesn't have a large effect, but this is how it's supposed to be): magnitude_comparison_upd_01

This example here is for pretty small earthquakes from a small swarm, detected by a larger number of stations (~60) at relative large distance (80-300 km) at rather high frequencies (5-19 Hz). Altogether, that causes cc-values to be rather small (even though I believe the events are quite similar) due to the noise.

calum-chamberlain commented 3 years ago

Great, thanks! Some thoughts on your thoughts:

Using different defintions of SNR in the code may be a bit confusing (e.g., template_gen uses original SNR) I agree about the confusion - my original definition of SNR for template_gen was quite ad hoc but seems to give useful results (taking the maximum amplitude in an impulsive waveform makes sense to me for checking template quality), so I'm keen to leave the template_gen SNR as it is, but I really should define in the doc string how it is computed and/or spin it out into its own simple function and refer to it from the template_gen docs to make it easier for people to check if they agree with it!

The amplitude/magnitude functions are not yet called from anywhere else in EQcorrscan, hence changing it should not be a problem here (of course needs to be clearly documented). Agreed - I think we should probably tag this for a 0.5.x release seeing as it has the potential to change outputs of code for users (I doubt anyone is using this though). Maybe with a warning that this functionality has changed from versions < 0.5.x

With the implementation in here, it's ok to generates templates and from these computes relative magnitudes using the same value for SNR, because: the L2-norm SNR in relative_magnitude will be stricter and exclude some traces that passed the first quality check with the original SNR. 👍

I think it might be worth changing the function name to something like snr_l2 just to ensure anyone using the old snr function doesn't get unexpected results...

flixha commented 3 years ago

Actually, the SNR-computation happens in relative_amplitude, which gets signal and noise amplitudes from
The function _snr n mag_calc is not used from anywhere in EQcorrscan right now, except that it's tested on its own in tests/mag_calc_test. I didn't change it, so I'll leave it for now as there's no reason to remove it.
I had a bit of a brain fart previously in regards to how I thought I could normalize the L2-norm amplitude in regards to the signal length. A normalized L2-norm is obviously just the RMS-amplitude, hence we can just use the RMS instead of the L2-norm so that any differences in window length have no effect. I tested this and it generally returns the same results (or fixes issues where the previous incorrect normalization actually mattered, i.e., when noise and signal were not the same length).

calum-chamberlain commented 3 years ago

Cool - if you are happy I think we can go ahead and merge this.

flixha commented 3 years ago

Yes, I'm happy with this! Thanks a lot for reviewing, and I'm happy to have it merged now.

darren-tpk commented 3 years ago

Hi @flixha, I have been using relative_magnitudes as well and I think what you've suggested here is great. I was looking through your commits in your PR and I have a question about the output derived from this part of your commit:

        rel_mag = math.log10(amplitude_ratio) + math.log10(
            math.sqrt( (1 + 1 / snr_y**2) / (1 + 1 / snr_x**2) ) * cc)

After following the conversation on #455, I was wondering if this new rel_mag output you are suggesting is still weighted by cc, since you are opting to remove the weight_by_correlations argument. That is to say, if we want to calculate the magnitude of an event of interest, do we compute it using new_mag = reference_mag + np.sum(rel_mags) / np.sum(cc_values) or new_mag = reference_mag + np.sum(rel_mags) / len(rel_mags) after the new magnitude bias correction?

Perhaps the function description could use a weighted / non-weighted specification when describing the dictionary being returned? I was using the weight_by_correlations argument earlier, but I am now convinced that this bias correction is important, so I wanted to clarify this.

Also, on a side note, if new_mag = reference_mag + np.sum(rel_mags) / np.sum(cc_values) is the way to go, I think we would have to be careful because len(rel_mags) and len(cc_values) might not be equal. The output CC dict from relative_magnitude() includes seed ids that fall below min_cc, whereas output relative magnitude dictionary does not. i.e. what we want is cc_values = [cc[1] for cc in ccs.items() if (cc[1] > min_cc)], based on your example in #455.

flixha commented 3 years ago

Hi @darren-tpk! When reading the previous code, I understood that weight_by_correlation was a bit of a misleading name: Rather than weighting the magnitude-changes depending on the CCC of the traces, it actually was there to turn on the correction for a measurement bias that comes with a low CCC (which is described in Schaff & Richards). The previous code used one of the introductory formulas in Schaff & Richards, rather than the final one that they come up with to fully correct for the measurement bias when calculating relative magnitudes from noisy traces. To avoid this confusion, I renamed the variable weight_by_correlation to correct_mag_bias´

There was never a weighting implemented in the sense of new_mag = reference_mag + np.sum(rel_mags) / np.sum(cc_values) (compare previous code in function: https://github.com/eqcorrscan/EQcorrscan/blob/168eba4141078e9b7e8e6917a7a385e0e56ff50e/eqcorrscan/utils/mag_calc.py#L636 So new_mag = reference_mag + np.sum(rel_mags) / len(rel_mags) is the way to calculate it now and before. You can of course try a weighting like you thought was included (with the returned CC-values) in addition to the magnitude-bias-correction (which is quite important if you have noisy traces which lead to CC~< 0.95 or so.

eqcorrscan / EQcorrscan