eqcorrscan / EQcorrscan

Earthquake detection and analysis in Python.
https://eqcorrscan.readthedocs.io/en/latest/
Other
167 stars 86 forks source link

WIP: Weighted correlations #508

Open calum-chamberlain opened 2 years ago

calum-chamberlain commented 2 years ago

What does this PR do?

Adds the ability to weight cross-correlations. Weights are currently expected to be in tr.stats.extra.weight for all stream correlation functions. This PR should be considering in tandem with #506

Why was it initiated? Any relevant Issues?

445

PR Checklist

flixha commented 2 years ago

@calum-chamberlain From the current commits, it reads to me that you are right now intending to give one weight to each template, rather than weight each trace in the template during the correlation - is that right? Or were you also intending to let traces in each template be weighted, e.g. like fast_matched_filter can do? Maybe it's too early to ask because it's still work in progress; I did a few tests with trace-weights in FMF and I think it would be promising to define trace weights based on the SNR of each trace in the template.

calum-chamberlain commented 2 years ago

I definitely intend to weight individual channels of individual templates in the same way that fast_matched_filter can. This is supported by these commits (I think!).

The array correlation functions that take multiple templates and one "stream" work on single-channel data, so only one weight per template is passed. The weights are extracted from the templates in the _get_array_dicts function here and are in the same shape as the pads (one for each trace of each template).

There are a lot of small changes to the correlation code to support this, and I might keep this PR separate from another PR that would actually open up the interface to make templates with weights.

flixha commented 1 year ago

Right now it's possible to set trace weights greater 1, and then one can run into an AssertionError when the detect_val is greater than the number of channels in https://github.com/eqcorrscan/EQcorrscan/blob/f5b8aa54959d584c269a22ce970686557abc6cb8/eqcorrscan/core/match_filter/detection.py#L97

To solve that, here are some options:

  1. do not allow weights to be larger than 1 / smaller than -1
  2. allow detect_val to become greater than number of channels.
  3. for each template, normalize weights according to the number of nonzero weights (i.e., weights for real channels, excluding nan-channels etc.).

All options have an effect on the detection results with threshold_type='absolute' and 'av_chan_corr' that may be a bit hard to follow / interpret. With MAD-threshold the results should be independent of the normalization, except for very small weights below the stability threshold that matters in some correlation backends. I prefer option 3 because I think that the absolute detection values are still somewhat interpretable compared compared to the other options.

Code to normalize weights according to number of nonzero weights in eqcorrscan.utils.correlate._get_array_dicts:


    # Normalize weights for each template so that CC sum stays smaller than
    # number of channels (else: AssertionError in Detection).
    weights = np.array([template_weights
                        for seed_id, template_weights in weight_dict.items()])
    weights = weights * (np.count_nonzero(weights, axis=0, keepdims=True) /
                         weights.sum(axis=0, keepdims=True))
    # weights = np.float32(weights)  # Required for FFTW backend
    Logger.info('Setting weights from trace-stats, minimum weight: %s, '
                'maximum weight %s, total sum of weights: %s',
                np.min(weights), np.max(weights), np.sum(weights))
    # Update weights in weight-dict with normalized values:
    for seed_id, normalized_template_weights in zip(seed_ids, weights):
        weight_dict.update({seed_id: normalized_template_weights})```
calum-chamberlain commented 1 year ago

Good point @flixha - I agree that interpreting the weighted absolute and av_chan correlation results will be less intuitive. For that reason I prefer to enforce that the user set weights between +/-1 rather than us normalize them for them. I can imagine the case where someone thinks they know what they are doing and sets a threshold and weights carefully which we then normalize resulting in unexpected results.

If we do enforce that then we should check the weights as soon as possible to avoid wasting people's time (e.g. raise an error as soon as possible, and maybe suggest some normalized weights that they can paste into their code).

calum-chamberlain commented 1 month ago

TODO: