Closed neonntt closed 2 years ago
Hello neonntt, I can't reproduce the issue. Changing the metric in the provided demo notebook works for me. So if you change the fourth cell in the demo notebook to the following code snippet:
st_dbscan = ST_DBSCAN(eps1 = 0.05, eps2 = 10, min_samples = 5, metric = 'mahalanobis')
Regarding the second question you mean you want to apply a weighted Euclidean distance?
Cheers, Eren
Eren, thank you so much for your reply. I will try out as you mentioned regarding changing the metric. Must be some issue with my data.
Regarding the second question, I would like to try a weighted distance with both Euclidean and Mahanalobis metric. Let's say we have another parameter in the data, for example, speed, and we would like the speed value to be given a higher weightage than the others while calculating the distance. Can you please guide how it can be implemented? Thx again,
Sure, you can adapt the code so using something like the following should work:
def fit(self, X):
"""
Apply the ST DBSCAN algorithm
----------
X : 2D numpy array with
The first element of the array should be the time
attribute as float. The following positions in the array are
treated as spatial coordinates. The structure should look like this [[time_step1, x, y], [time_step2, x, y]..]
For example 2D dataset:
array([[0,0.45,0.43],
[0,0.54,0.34],...])
Returns
-------
self
"""
# check if input is correct
X = check_array(X)
if not self.eps1 > 0.0 or not self.eps2 > 0.0 or not self.min_samples > 0.0:
raise ValueError('eps1, eps2, minPts must be positive')
n, m = X.shape
# Compute sqaured form Euclidean Distance Matrix for 'time' attribute and the spatial attributes
time_dist = pdist(X[:, 0].reshape(n, 1), metric=self.metric)
# --------
# --------
# Line changed here:
# np.array of weights
weights = np.array([0.5, 1, 0.2, 0.3]) # weights for the features
euc_dist = pdist(X[:, 1:], 'wminkowski', p=2, w=weights)
# afterwards the same code snippets
# --------
# --------
# filter the euc_dist matrix using the time_dist
dist = np.where(time_dist <= self.eps2, euc_dist, 2 * self.eps1)
db = DBSCAN(eps=self.eps1,
min_samples=self.min_samples,
metric='precomputed')
db.fit(squareform(dist))
self.labels = db.labels_
return self
Cheers, Eren
Thanks a ton, Eren...will try it and reach out to you in case I need more help. regards
Easy, just reopen issue in that case.
Cheers, Eren
Hi! Thanks so much for this implementation. I wanted some guidance on how to use a different distance metric than the default euclidean. I have data with multiple features and wanted to use another distance metric, such as mahalanobis would the implementation be as under:- st_dbscan = ST_DBSCAN(eps1 = 0.4, eps2 = 5, min_samples = 5, metric = 'mahalanobis')
I did try the above, but got an error Singular matrix. However, when I checked the correlation, it seems to be ok,
Also, in case I would want to use a different weightage for each of the features while calculating the distance, how should i go about it? Would be grateful if you could please help out.
Thanks