eqcorrscan / EQcorrscan

Earthquake detection and analysis in Python.
https://eqcorrscan.readthedocs.io/en/latest/
Other
166 stars 86 forks source link

details of how to make detections in matched_filter #332

Closed ikahbasi closed 5 years ago

ikahbasi commented 5 years ago

What do you need help with?

Some details of match_filter.match_filter

Provide an example so that we can reproduce your problem

Hi Detection of new events in EQcorrscan will happen when sum of shifted cross-correlation is above a threshold. If we have two traces and related templates, and results of correlation for one of them is -0.7 and other is 0.7. Is it true that the result of cccs is 0 and we can't detect it? I would like to know if some polarity of template's traces was inverted and some was not, what happen in detection? Especially what happen if we merge templates of two near event's?(Like when stack templates with number of unequal stations) Can i multiple minus in template with negative correlation value compared to master, to invert polarity before stack?

What help would you like?

Does it have a negative impact?

calum-chamberlain commented 5 years ago

Howdy,

Is it true that the result of cccs is 0 and we can't detect it?

Yes. -0.7 + 0.7 is 0. See discussion in Warren-Smith et al., Shelly and Hardebeck and others of how correlations can be used to get at focal mechanisms by assuming negative correlations correspond to reversed polarity.

I would like to know if some polarity of template's traces was inverted and some was not, what happen in detection?

See above papers, but I would assume that you would end up with the case you suggest - however, you may find that you get "significant" correlation sums later in the wavetrain. A thought experiment: Say you have your two channel template and you flipped one of the two channels. When you correlate that template with itself it should give you +1.0 on the non-flipped trace and -1.0 on the flipped trace, which would sum to 0.0. However, if you shift the template by a few samples, maybe both traces will give a positive (but weaker) correlation, say 0.3 and 0.3, summing to 0.6, which might be above your threshold. This relates to ideas of "cycle-skipping" discussed in earlier papers by David Shelly. Give it a go and see what happens. Sometimes this kind of thing will happen, but not necessarily all the time (for all templates and possible detections).

Especially what happen if we merge templates of two near event's?(Like when stack templates with number of unequal stations) Can i multiple minus in template with negative correlation value compared to master, to invert polarity before stack?

Yes you can. There is nothing to stop you! Whether it is a good idea or not is another question entirely! I would not advise this though. Matched-filters are very limited. They are very good at detecting events that are similar in location, focal mechanism and other source properties. This can be thought of as both a limitation and a strength. If you want to detect similar events then this is great; however if you want the most complete earthquake catalogue available, then matched-filters are likely not for you (although iterating and building larger template databases might help, see Frank and Abercrombie.

A couple of points:

  1. Have you looked at the clustering functions in EQcorrscan? These can be helpful for grouping similar waveforms prior to stacking if you want to reduce your template database. Note that these have been sped up quite a lot for 0.4.0 because I was using them to cluster 540,000 waveforms to search for repeating earthquakes. If you were to do this, I would advise clustering by hypocentral distance first (using catalog_cluster), then by correlation using cluster. This should result in groups of waveforms that are suitable for stacking. The API is a little clunky though.
  2. Why do you want to stack waveforms from these events? Do you have too many templates? Are you running into memory limitations? In general I would use the largest template database you can to capture the greatest range of possible sources. Hopefully with 0.4.0 you will have a seen a drop in memory consumption during correlations, which should allow you to run more templates.

Apologies for the long response. These are not really "EQcorrscan" questions, but interesting and worthwhile nonetheless! Also, most of what I have said is my opinion and I haven't provided any demonstrations of these thoughts. You should experiment yourself and see if what I have said is true!

ikahbasi commented 5 years ago

Thanks for your very helpful comment.

See discussion in Warren-Smith et al., Shelly and Hardebeck

I'm going to read your suggestion.

Have you looked at the clustering functions in EQcorrscan?

Yes. I'm using "clustering. space_cluster()" to cluster catalog and then stack templates. I set d_thresh between 3 to 5 for test, and it gave me about 200 groups of 1100 events in case of 5km. However i'm not sure this radius is good for detection on sum-of-cross-correlation and covering area or not. then i used "stacking.align_traces()" and aligned traces with rate of correlation to make good stack. But I haven't made cluster template waveform by correlation ratio yet.

Why do you want to stack waveforms from these events? Do you have too many templates?

Yes, I have about 1100 events. It takes too much capacity of ram and long time of processing. I would like to optimize my tests.

Hopefully with 0.4.0 you will have a seen a drop in memory consumption during correlations, which should allow you to run more templates.

It's very good. I have not installed v0.4 yet. But i'm going to install it soon. It's very interesting for me.

Apologies for the long response.

This is your generosity.

These are not really "EQcorrscan" questions...

Yes, i'm sorry. I think maybe it is somebody else's problem. I can ask my questions via e-mail if that's okay.

ikahbasi commented 5 years ago

When you correlate that template with itself it should give you +1.0 on the non-flipped trace and -1.0 on the flipped trace, which would sum to 0.0.

Is there any reason that you don't use sum of absolute cross-correlation in match_filter? If use this, then we will be able to detect events, even with any polarity and don't loss nothing. Even with flipped some template's traces...

calum-chamberlain commented 5 years ago

Like I said, matched-filter's are good at finding similar events - events within 5km of one-another are not necessarily similar, and if you are going to stack you really should group by correlation (because that is the statistic you are using for detection - think about whether your stacked template would actually detect the events you are putting into the stack...). It's worth having a think about the factors that affect waveform shape to justify your clustering.

1,100 events isn't so many (although it depends on how much memory you have available) - I would definitely recommend installing 0.4.0, it is much more memory efficient in the correlation stages. The install should be simple, and most of the API has not changed, so you shouldn't run into many instances where you need to change code. It is also worth thinking about where you are running into memory issues: is it in the pre-processing of the data (e.g. resampling and filtering), or in the correlation stages? If it is in the pre-processing stages you might want to process your data and save the processed waveforms on disk, and use the matched-filter functions API (rather than the object-oriented API), as long as you are careful with your processing parameters. You can also turn off parallel processing to save memory. 0.4.0 also now uses a more efficient FFT for resampling, so hopefully processing you data should be a little faster.

Happy to have this question up here for now, I think it will benefit other people, but it is worth being clear that what I am saying is not always fact, and my suggestions should be tested!

Is there any reason that you don't use sum of absolute cross-correlation in match_filter?

Yes: we take advantage of the fact that noise should stack destructively when correlations are centred on zero. The statistics of MAD thresholding only work for nearly Gaussian distributions (the sum of the absolute of the correlations is not Gaussian). Furthermore, matched-filters are designed to find matches. Summing the absolute would not be a matched-filter. I would focus on thinking about what events you are trying to detect and whether the clustering you are doing makes sense - dissimilar events should not be stacked together to make a new template - this is very likely to reduce SNR, make some weird waveforms and generally do a poor job of discriminating between signal and noise.

ikahbasi commented 5 years ago

Thanks for your good description.

I would definitely recommend installing 0.4.0, it is much more memory efficient in the correlation stages.

Yes! I installed and this is very efficient and amazing. I love this.

Yes: we take advantage of the fact that noise should stack destructively when correlations are centred on zero. The statistics of MAD thresholding only work for nearly Gaussian distributions (the sum of the absolute of the correlations is not Gaussian). Furthermore, matched-filters are designed to find matches. Summing the absolute would not be a matched-filter. I would focus on thinking about what events you are trying to detect and whether the clustering you are doing makes sense - dissimilar events should not be stacked together to make a new template - this is very likely to reduce SNR, make some weird waveforms and generally do a poor job of discriminating between signal and noise.

I get it. Thanks.