Open calum-chamberlain opened 2 years ago
@calum-chamberlain From the current commits, it reads to me that you are right now intending to give one weight to each template, rather than weight each trace in the template during the correlation - is that right? Or were you also intending to let traces in each template be weighted, e.g. like fast_matched_filter
can do?
Maybe it's too early to ask because it's still work in progress; I did a few tests with trace-weights in FMF and I think it would be promising to define trace weights based on the SNR of each trace in the template.
I definitely intend to weight individual channels of individual templates in the same way that fast_matched_filter can. This is supported by these commits (I think!).
The array correlation functions that take multiple templates and one "stream" work on single-channel data, so only one weight per template is passed. The weights are extracted from the templates in the _get_array_dicts function here and are in the same shape as the pads (one for each trace of each template).
There are a lot of small changes to the correlation code to support this, and I might keep this PR separate from another PR that would actually open up the interface to make templates with weights.
Right now it's possible to set trace weights greater 1, and then one can run into an AssertionError
when the detect_val
is greater than the number of channels in https://github.com/eqcorrscan/EQcorrscan/blob/f5b8aa54959d584c269a22ce970686557abc6cb8/eqcorrscan/core/match_filter/detection.py#L97
To solve that, here are some options:
detect_val
to become greater than number of channels.All options have an effect on the detection results with threshold_type='absolute'
and 'av_chan_corr'
that may be a bit hard to follow / interpret. With MAD
-threshold the results should be independent of the normalization, except for very small weights below the stability threshold that matters in some correlation backends. I prefer option 3 because I think that the absolute detection values are still somewhat interpretable compared compared to the other options.
eqcorrscan.utils.correlate._get_array_dicts
:
# Normalize weights for each template so that CC sum stays smaller than
# number of channels (else: AssertionError in Detection).
weights = np.array([template_weights
for seed_id, template_weights in weight_dict.items()])
weights = weights * (np.count_nonzero(weights, axis=0, keepdims=True) /
weights.sum(axis=0, keepdims=True))
# weights = np.float32(weights) # Required for FFTW backend
Logger.info('Setting weights from trace-stats, minimum weight: %s, '
'maximum weight %s, total sum of weights: %s',
np.min(weights), np.max(weights), np.sum(weights))
# Update weights in weight-dict with normalized values:
for seed_id, normalized_template_weights in zip(seed_ids, weights):
weight_dict.update({seed_id: normalized_template_weights})```
Good point @flixha - I agree that interpreting the weighted absolute and av_chan correlation results will be less intuitive. For that reason I prefer to enforce that the user set weights between +/-1 rather than us normalize them for them. I can imagine the case where someone thinks they know what they are doing and sets a threshold and weights carefully which we then normalize resulting in unexpected results.
If we do enforce that then we should check the weights as soon as possible to avoid wasting people's time (e.g. raise an error as soon as possible, and maybe suggest some normalized weights that they can paste into their code).
TODO:
accept_weights = input("Weights outside +/-1 bound, use normalised weights? [Y/n]")
)
What does this PR do?
Adds the ability to weight cross-correlations. Weights are currently expected to be in
tr.stats.extra.weight
for all stream correlation functions. This PR should be considering in tandem with #506Why was it initiated? Any relevant Issues?
445
PR Checklist
develop
base branch selected?CHANGES.md
.- [ ] First time contributors have added your name toCONTRIBUTORS.md
.