Identifying attractors via the unsupervised method of AttractorsViaFeaturizing is quite a bit slow. This becomes noticeable apparently when the number of initial conditions is above ~1000.
Clustering the features is done with DBSCAN and is quite fast. As far as I know, the slowness comes mainly from the silhouette method for determining the optimal clustering radius. This is done in the function optimal_radius_dbscan_silhouette, which is mostly taken from the source code of the bSTAB paper. It chooses the radius that maximizes the mean silhouette across clusters (instead of the maximum silhouette as in the bSTAB paper). To be on the safe side, we try 200 radii values. An improvement would be to decrease this number, and try smarter guesses. Another might be in the iterative procedure. Currently, we iterate through all values and then find the maximum. There might be a better way.
Additionally, we can parallelize this part of the code. Should be easy, as every iteration is independent.
This improvement would be useful but I don't think it's super pressing since the precision of the algorithm is quite good. I can work on it when I have a bit more time, but not now.
Identifying attractors via the unsupervised method of
AttractorsViaFeaturizing
is quite a bit slow. This becomes noticeable apparently when the number of initial conditions is above ~1000.Clustering the features is done with DBSCAN and is quite fast. As far as I know, the slowness comes mainly from the
silhouette
method for determining the optimal clustering radius. This is done in the functionoptimal_radius_dbscan_silhouette
, which is mostly taken from the source code of the bSTAB paper. It chooses the radius that maximizes the mean silhouette across clusters (instead of the maximum silhouette as in the bSTAB paper). To be on the safe side, we try200
radii values. An improvement would be to decrease this number, and try smarter guesses. Another might be in the iterative procedure. Currently, we iterate through all values and then find the maximum. There might be a better way.Additionally, we can parallelize this part of the code. Should be easy, as every iteration is independent.
This improvement would be useful but I don't think it's super pressing since the precision of the algorithm is quite good. I can work on it when I have a bit more time, but not now.