Closed flixha closed 2 years ago
I would be cautious of replacing the NaNs with real values as it may provide clusters where there are no clusters. From a skeptical perspective I would rather replace NaNs with ones (with one being the maximum distance possible I think). This would reduce the ability to cluster if clustering is not based on the minimum distance between events.
I think if we are changing this, then we should make it an option for the user to set, and also let them set how the clusters are made (e.g. with the method kwarg which I have been meaning to expose anyway). I would keep the default to put NaNs in the matrix, but raise a useful error message telling the user to consider how they want to replace NaNs and the linkage method - we could also require user input at that stage if NaNs are found.
I'm reticent to allow this to make clusters where there are no links without explicitly telling the user that something funky might happen (and people don't read warnings unless something breaks, so we should make it break, or require user input to make sure that the user thinks about what they are doing).
Thanks for the quick feedback. I have now made some changes so that:
dist_mat
clustering()
to control the method, metric, and sort_order of linkage as you suggestedIt's probably reasonable to replace nans with ones to avoid clustering where there is no real clustering. A lot is taken care of by excluding events where no traces match with any other events. Now that we expose the linkage methods and metrics, I feel it's good if the user can control whether to replace with e.g., 1, mean, min etc., so that this can be adjusted properly to the method and metric that the user uses for linkage-calculation.
Let me know if you have other suggestions for this and if you feel this handling is useful and makes sense like this.
Some tests are failing due the an obspy-numpy conflict, easiest may be to wait for the obspy-update. All tests pass locally with master
-obspy.
That looks great @flixha, thank you!
Good point on the obspy numpy issues - if you are happy to wait for this to be merged until the obspy release then that might be simplest, otherwise I can faff around with pinning versions until then,
@calum-chamberlain thanks! No worries, let's wait for Obspy; no need to do the extra work when the solution is around the corner.
@flixha this is now running obspy 1.3.0, and it looks like there is a test fail in comparing clusters. when you have the chance would you be able to have a look at that test?
@flixha this is now running obspy 1.3.0, and it looks like there is a test fail in comparing clusters. when you have the chance would you be able to have a look at that test?
Hi Calum, of course; sorry for being a bit slow to answer. I'll check this tomorrow..
HI @calum-chamberlain , finally found out what caused the problem. That took longer than expected... Anyways, the problem was that because of the hacky way in which I remove a single trace from each stream, the order of the streams actually mattered. And that order depends on the order in which the files appear in glob. Now I order the template streams and then it worked on all the systems I tested. Waiting for tests on here now.
Thanks for having a look; Yes, this is good to go!
dist_mat
can contain NaNs in the following case: Template 1: traces STA1.HHZ, STA2.HHZ Template 2: trace STA1.HHZ Template 3: trace STA2.HHZ Here template 1 & 2 and 1 & 3 can be correlated, but 2 & 3 cannot be correlated because no traces match between them. But template 2 and 3 can be compared indirectly via template 1. When the distance-matrix contains NaNs,squareform
in L287 always causes an error.What does this PR do?
Replaces NaNs in dist_mat with the average of the column / row so that NaN-links between single event pairs do not make
cluster()
fail.Why was it initiated? Any relevant Issues?
Clustering did not work as expected when station setups changed too much (e.g., multiple temporary deployments where only some events have overlapping station coverage from permanent stations).
PR Checklist
develop
base branch selected?CHANGES.md
.