SalilJain / pyspark_dbscan

An "Efficient" Implementation of DBSCAN on PySpark
27 stars 10 forks source link

Problem with functionality of dbscan ! #6

Open clevilll opened 3 years ago

clevilll commented 3 years ago

I'm trying to use your offered PySpark DBSCAN but it seems the dbscan.py doesn't work and return None when you print it.

df_clusters = dbscan.process(spark, df, .2, 10, distance.euclidean, 2, "checkpoint")
print(df_clusters)
#None

Kindly I provided a colab Notebook for quick debugging.

SalilJain commented 2 years ago

@clevilll https://colab.research.google.com/drive/1lG69SZQt6SV0E2CeS5_-nZTsmDohRXyR?usp=sharing see if this resolves your issue

Leo-Sun-BMSTU commented 2 years ago

I'm trying to use your offered PySpark DBSCAN but it seems the dbscan.py doesn't work and return None when you print it.

df_clusters = dbscan.process(spark, df, .2, 10, distance.euclidean, 2, "checkpoint")
print(df_clusters)
#None

Kindly I provided a colab Notebook for quick debugging.

Hi! I have the same problem. Somthing wrong with RDD, I had AttributeError: 'str' object has no attribute 'add'. Had you solve it?