Add new grouping method: AttractorsViaPairwiseComparison

JuliaDynamics / Attractors.jl

Find attractors of dynamical systems, their basins, and continue them across parameters. Study global stability (a.k.a. non-local, or resilience). Also tipping points functionality.

MIT License

27 stars 5 forks source link

Add new grouping method: AttractorsViaPairwiseComparison #97

Closed KalelR closed 9 months ago

KalelR commented 9 months ago

As we had discussed in Issue #84, this PR adds a simpler grouping method to "clusterize" features, particularly useful of course when using AttractorsViaFeaturizing.

The idea is that clustering algorithms aren't really necessary for grouping features. The systems are deterministic, and we can usually integrate systems long enough to have the features converge. If the user knows good features, and for how to integrate them, then the features in feature space will be very well behaved, with attractors corresponding to small clouds well separated from each other. Therefore, there is no need to use clustering algorithms. Attractors can be found directly by simply grouping together features close to each other within some epsilon, and separating those farther away.

The algorithm is thus very simple. Basically

Assign feature 1 as a new attractor -> clusters_idxs = [1]
For each subsequent feature,
1. compute distance from the future to the previously found attractors -> dist(feature, clusters_idxs)
  1. find minimum distance between future and cluster idxs (ie find the the cluster that is closest to the current feature) if min dist > threshold, consider the future a new attractor; if min dist < threshold, assign it to the closest cluster

This is nicer than simple histograms mainly because it allows for more general features, beyond just vectors of numbers. They can be, for instance, matrices, which allows for features to be the entire attractors. The distance metric just needs to take in feature_A and feature_B and return a number. Plus, given that features are well behaved, the distance_threshold doesn't really need to be fined-tuned much. This is wayy more reliable, understandable, and easier to use than DBSCAN. Since I've implemented this I never went back. And to be honest I don't see any future scenario when I would.

One part left to do, if needed, is to implement support for computing distances using these parameters:


par_weight::Real = 0, plength::Int = 1, spp::Int = 1

codecov-commenter commented 9 months ago

Codecov Report

Merging #97 (9f90d1e) into main (8171746) will increase coverage by 0.35%. The diff coverage is 96.96%.

@@            Coverage Diff             @@
##             main      #97      +/-   ##
==========================================
+ Coverage   84.27%   84.62%   +0.35%     
==========================================
  Files          20       21       +1     
  Lines        1151     1184      +33     
==========================================
+ Hits          970     1002      +32     
- Misses        181      182       +1

Files	Coverage Δ
src/mapping/grouping/all_grouping_configs.jl	`50.00% <ø> (ø)`
src/mapping/grouping/pairwise_comparison.jl	`96.96% <96.96%> (ø)`

:mega: We’re building smart automated test selection to slash your CI/CD build times. Learn more

Datseris commented 9 months ago

What do you mean that you need to add support for

par_weight::Real = 0, plength::Int = 1, spp::Int = 1

Is this for the continuation? If so, plength, spp come from the continuation functions, in which sense do you need to add support for them?

par_weight is something we can, and should, safely ignore. I can't conceive a scientific argument why anyone should ever use par_weight. We had it in the first clustering-continuation because it is in the MCBB paper. Whether it makes sense... it doesn't to me.

Datseris commented 9 months ago

it is not clear to me how one would choose the distance threshold. I guess with the scaling to 0-1 turned on, a good value could be 0.1 or something like that?

Datseris commented 9 months ago

Is this method significantly faster than the clustering method? You should do formal benchmarks and post the results here. Compute a list of features, and then benchmark the group_features function with both algorithms.

KalelR commented 9 months ago

What do you mean that you need to add support for
par_weight::Real = 0, plength::Int = 1, spp::Int = 1
?

Is this for the continuation? If so, plength, spp come from the continuation functions, in which sense do you need to add support for them?

par_weight is something we can, and should, safely ignore. I can't conceive a scientific argument why anyone should ever use par_weight. We had it in the first clustering-continuation because it is in the MCBB paper. Whether it makes sense... it doesn't to me.

As I understand they included this parameter distance in the computation of distances for the clustering. The idea was to "ensure that similar asymptotic states with strongly different parameter values are distinguished from each other", which I never really understood to be honest haha - isn't the whole point of continuation tracking an attractor across different parameters?. But anyways, I thought that if people wanted this in the continuation method via clustering, they might also want it in the continuation method using this PairwiseComparison - the latter doesn't cluster per se, but we still computes distances. What do you think?

TLDR: This can be easily implemented in the future if anyone wishes. For now, I don't think it's needed, but wanted to check with you.

KalelR commented 9 months ago

it is not clear to me how one would choose the distance threshold. I guess with the scaling to 0-1 turned on, a good value could be 0.1 or something like that?

The choice is a balance between two points

you want it to be large enough to account for variations in the features from the same attractor - if it's too small, the algorithm will find duplicate attractors (which to be fair is easy to deal with, and all other algorithms also have this problem).
you want it to be small enough to not group together features from distinct attractors. This requires some knowledge of how spread your features are. If it's too big, the algorithm will miss some attractors, as it groups 2+ distinct attractors. This is harder to deal with.

So what I've been doing is to ensure the deviations in the features are super small, which can be achieved with a good choice of featurizer, T and Ttr. Then I set distance_threshold to a small value, like 1e-2 or 1e-3 even for non-rescaled features. I reduce it until no more unique attractors are found, and then just ensure that there are no duplicate attractors.

KalelR commented 9 months ago

Is this method significantly faster than the clustering method? You should do formal benchmarks and post the results here. Compute a list of features, and then benchmark the group_features function with both algorithms.

Not sure if it's significantly faster, but I don't think the speed really matters to be honest. It's so much simpler, more reliable and easy to use than the clustering methods that I wouldn't mind even if it wasn't slower. But now, I think at worse it's a comparable speed. DBSCAN at least requires the computation of the distance matrix, which is order ~N^2. This method, the way I implemented at least, does ~N*number_attractors, which is generally smaller. Memory-wise it's much better also, as it only stores ~number_attractors distances for each iteration.

Datseris commented 9 months ago

It's so much simpler, more reliable and easy to use than the clustering methods that I wouldn't mind even if it wasn't slower.

I don't agree with this statement. I agree that it is simpler, but I can't see a way that it is more reliable. You need to pick this distance number. In the clustering you don't pick anything, it is done automatically. Besides, if the localization of points in clusters is so good, there is no way DBSCAN would fail. And this method targets such cases. It would be great if you can provide a concrete MWE where this method outperforms the Clustering one.

Not only that, but the quality of your clustering depends on the order of the initial conditions. Hence, each time you call basins_fractions you will get different results unless you start the simulation with the same seed. It's because the "center" of a cluster in this new algorithm is the very first point, and the very first point is random.

DBSCAN, as well as all other grouping algorithms we have so far, are invariant to the sequence of the initial conditions.

Note that this doesn't stop us from merging this in, but I am just pointing out that I personally wouldn't use this one instead of the Clustering version. If I knew the system so well that I knew what the features would do, this means that I know the attractors. Then, I would just use AttractorsViaProximity which is the least ambiguous out of all 4 methods.

Datseris commented 9 months ago

The idea was to "ensure that similar asymptotic states with strongly different parameter values are distinguished from each other", which I never really understood to be honest haha - isn't the whole point of continuation tracking an attractor across different parameters?.

Exactly. Hence, I would argue, this whole par_weight business is a bad choice. My experience working on "Framework for global stability" leads me to this conclusion. I say we should ignore this par_weight completely. Perhaps we should even hide it from the Clustering continuation in the future documentation as well. This wouldn't be a breaking change, but we simply would no longer support it.

Datseris commented 9 months ago

Not sure if it's significantly faster, but I don't think the speed really matters to be honest

Sure but please do a formal benchmark and report the results. It shouldn't take too much effort and performance matters in this context. A lot. For me, it is the prime reason to change algorithms. The clustering is also done 100s of times to optimize the clustering radius.

Datseris commented 9 months ago

You should document tips for choosing the distance threshold by summarizing what you wrote here.

KalelR commented 9 months ago

I don't agree with this statement. I agree that it is simpler, but I can't see a way that it is more reliable. You need to pick this distance number. In the clustering you don't pick anything, it is done automatically. Besides, if the localization of points in clusters is so good, there is no way DBSCAN would fail. And this method targets such cases. It would be great if you can provide a concrete MWE where this method outperforms the Clustering one.

DBSCAN also requires a distance parameter, which is much less intuitive (leads to the black box behavior we discussed so many times). The only reason the clustering method as a whole doesn't require this parameter is that we brute-force search for an "optimal value". This is very debatable. In the use cases I've tried (on a system of 2 coupled 2d oscillators and on 100d networks) this lead to problems so often. Sometimes the "optimal" parameter was way too small, and the algorithm found lots of duplicates. Sometimes it was way too big, and missed some attractors. Sometimes it worked great for one system parameter and then terrible for another with similar dynamics.

This simple comparison came as a way to simplify the procedure and enable me some clarity to know when the algorithm really is working. This is cases when I don't know the attractors of the system, but I can probe around, change parameters and featurizer until I'm confident the correct attractors were found (no duplicates, no over-grouping).

I agree that in the well behaved cases it and DBSCAN should work, and admit I oversold this one algorithm, probably as a frustration with so much time wasted with DBSCAN haha. But the point is also that in well-behaved cases using DBSCAN is like killing a fly with a cannon, and with an unintuitive cannon at that. Maybe we should think of some procedures to validate the clustering better

If the worry is having an automatic procedure, we could think of a search algorithm here as well. Something like reducing distance_threshold until no more unique attractors are found and no duplicates appear.

Not only that, but the quality of your clustering depends on the order of the initial conditions. Hence, each time you call basins_fractions you will get different results unless you start the simulation with the same seed. It's because the "center" of a cluster in this new algorithm is the very first point, and the very first point is random.

Yeah, excellent point. There are no guarantees, in the use-case I have in mind, when the data isn't "noisy", the algorithm will be invariant. When the data is, though, DBSCAN should be used. Or the features should be improved.

Note that this doesn't stop us from merging this in, but I am just pointing out that I personally wouldn't use this one instead of the Clustering version. If I knew the system so well that I knew what the features would do, this means that I know the attractors. Then, I would just use AttractorsViaProximity which is the least ambiguous out of all 4 methods.

The whole point of course is that you don't know the attractors, but you can play around enough to ensure the features are good (this should always be done anyway, with or without DBSCAN) and that the clustering is working. Then the question is with which algorithm this is more easily and reliably done, and I've have way too many problems with DBSCAN personally.

By the way, one point I haven't thought much about, just mentioned here, is that the attractors themselves could be used as features. Then one would separate attractors by some distance metric in state space directly. This is maybe the simplest / most naive attractor-finding algorithm one could think of haha, but could it work? If the metric is good enough.

Datseris commented 9 months ago

that the attractors themselves could be used as features. Then one would separate attractors by some distance metric in state space directly

You would need to find the attractors first. The only method we have that finds attractors is the recurrences one. And if you use this method, you need no other method to group attractors, as the recurrences anyways maps individual i.c. to attractors correctly.

DBSCAN also requires a distance parameter, which is much less intuitive (leads to the black box behavior we discussed so many times). The only reason the clustering method as a whole doesn't require this parameter is that we brute-force search for an "optimal value". This is very debatable. In the use cases I've tried (on a system of 2 coupled 2d oscillators and on 100d networks) this lead to problems so often. Sometimes the "optimal" parameter was way too small, and the algorithm found lots of duplicates. Sometimes it was way too big, and missed some attractors. Sometimes it worked great for one system parameter and then terrible for another with similar dynamics.

This simple comparison came as a way to simplify the procedure and enable me some clarity to know when the algorithm really is working. This is cases when I don't know the attractors of the system, but I can probe around, change parameters and featurizer until I'm confident the correct attractors were found (no duplicates, no over-grouping).

That's fine, but then you should somehow summarize this discussion in the docstring of the new method for when it should be preferred over the Clustering method.

If the worry is having an automatic procedure, we could think of a search algorithm here as well. Something like reducing distance_threshold until no more unique attractors are found and no duplicates appear.

Given the amount of papers on optimizng the clustering of DBSCAN, and the fact that to this day we even presented an even more optimal way, this could take a lot of time. You should open an Issue Request for this after we merge this PR in.

KalelR commented 9 months ago

I benchmarked the two methods using two sets of features: one that is just 5 randomly generated features repeated N times, another extracted from Henon (like in the tests). The distance threshold was the same for both, and they find the correct clusters. As expected from the difference in scaling, this pairwise comparison is faster and uses less memory than DBSCAN. For some reason in the Henon case the number of allocations in DBSCAN is smaller, but the memory usage is bigger. I don't know why.

It is somewhat unfair to compare them though, as the pairwise comparison is so much simpler than DBSCAN.

display

Code is here https://gist.github.com/KalelR/1c3a66a157a8e347dcd6d6ad28f633cc, also in our FrameworkGlobalStability repo.

Datseris commented 9 months ago

Great, thanks, you should state the difference in performance scaling in the docstring.

Please also address the following comment in the docstring:

DBSCAN also requires a distance parameter, which is much less intuitive (leads to the black box behavior we discussed so many times). The only reason the clustering method as a whole doesn't require this parameter is that we brute-force search for an "optimal value". This is very debatable. In the use cases I've tried (on a system of 2 coupled 2d oscillators and on 100d networks) this lead to problems so often. Sometimes the "optimal" parameter was way too small, and the algorithm found lots of duplicates. Sometimes it was way too big, and missed some attractors. Sometimes it worked great for one system parameter and then terrible for another with similar dynamics.

This simple comparison came as a way to simplify the procedure and enable me some clarity to know when the algorithm really is working. This is cases when I don't know the attractors of the system, but I can probe around, change parameters and featurizer until I'm confident the correct attractors were found (no duplicates, no over-grouping).

(That's fine, but then you should somehow summarize this discussion in the docstring of the new method for when it should be preferred over the Clustering method.)

Lastly, please add an example in the examples list that utilizes this method. Would be fantastic if you used a real world example, i.e., something from your own work, as you said that this method helped you there a lot.

Then, I can merge!

(and don't forget to increment minor version and add changelog entry)

KalelR commented 9 months ago

that the attractors themselves could be used as features. Then one would separate attractors by some distance metric in state space directly

You would need to find the attractors first. The only method we have that finds attractors is the recurrences one. And if you use this method, you need no other method to group attractors, as the recurrences anyways maps individual i.c. to attractors correctly.

I misspoke, I meant that the trajectories can be used as features themselves. Meaning the featurizer function f(A, t) can simply return A and the algorithm will be fine with this as long as the distance metric takes in A and B, two trajectories, and returns some value.

KalelR commented 9 months ago

Lastly, please add an example in the examples list that utilizes this method. Would be fantastic if you used a real world example, i.e., something from your own work, as you said that this method helped you there a lot.

This would be quite nice, I'll think of a nice example where it helped. But since the work is still unpublished, I'd rather wait a month or so before putting this online with all the methodology and parameters correctly chosen, just to be sure. Is that alright?

Datseris commented 9 months ago

We don't have to rescale to [0, 1]. We can use StateSpaceSets.standardize instead, which rescales to 0 mean and 1 standard deviation.