Open lkashfi opened 3 years ago
@lkashfi did you add proper config: spark = SparkSession \ .builder \ .appName("DBSCAN") \ _.config("spark.jars.packages", "graphframes:graphframes:0.7.0-spark2.3-s2.11") \ .config('spark.driver.host', '127.0.0.1') \ .getOrCreate()
@lkashfi did you add proper config: spark = SparkSession .builder .appName("DBSCAN") _.config("spark.jars.packages", "graphframes:graphframes:0.7.0-spark2.3-s2.11") .config('spark.driver.host', '127.0.0.1') .getOrCreate()
yes i do
spark = SparkSession \ .builder \ .appName("DBSCAN") \ .config("spark.jars.packages", "graphframes:graphframes:0.7.0-spark2.3-s_2.11") \ .config('spark.driver.host', '127.0.0.1') \ .getOrCreate()
yes I did I do not know what causes this error
Are you sure that the jar package is loaded?
On Tue, Jul 20, 2021 at 12:13 PM lkashfi @.***> wrote:
yes I did I do not know what causes this error
— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SalilJain/pyspark_dbscan/issues/5#issuecomment-883518898, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAP2LI3ZTMAV3KX4NHCVNOTTYWODFANCNFSM5AVRCYWA .
dear SalilJain please see my code and tell me what is my fault Dataset: 3D_spatial_network.txt
My code in Google colab: https://colab.research.google.com/drive/1J0dv0fddc56LwamRCo-v_W2ifQ1wFzdn?usp=sharing
Are you sure that the jar package is loaded? … On Tue, Jul 20, 2021 at 12:13 PM lkashfi @.***> wrote: yes I did I do not know what causes this error — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAP2LI3ZTMAV3KX4NHCVNOTTYWODFANCNFSM5AVRCYWA .
I am looking forward to your answer
@lkashfi I couldn't figure out if it was loading the graphframes based on your code
@lkashfi did you try to run it on your local machine?
@SalilJain Hello I try to run this code by GPU in google colab or in my laptop
on your laptop @lkashfi
on your laptop @lkashfi
@SalilJain Yes First I try to run on my laptop with 8 cores but it occurred an error (I have java 11) So I try to run the code in google colab with java 8 (GPU T4 in google colab have about 40 cores)
on your laptop @lkashfi Dear @SalilJain Hi Please tell me what format should have the input data to this DBscan algorithm?
dear SalilJain please see my code and tell me what is my fault Dataset: 3D_spatial_network.txt
My code in Google colab: https://colab.research.google.com/drive/1J0dv0fddc56LwamRCo-v_W2ifQ1wFzdn?usp=sharing
@lkashfi Could you run your notebook using PySpark-based dbscan offered by @SalilJain ?
@clevilll I wasn't able to find where you are importing the code?
@lkashfi see if this resolves https://colab.research.google.com/drive/1lG69SZQt6SV0E2CeS5_-nZTsmDohRXyR?usp=sharing
Hello In the pySpark environment, I used the PCA algorithm on a data set, which results in a data frame containing an ID column and a column containing two-dimensional ordered pairs. Then I applied this DBScan algorithm to it, that is, I called the dbscan.py file in this pySpark environment. An error occurs while executing the following lines:
5 df3 = data.repartition(8) ----> 6 df3_clusters = dbscan.process(spark, df3, 1, 3, distance.euclidean, 2, "checkpoint")
The error is as follows:
Py4JJavaError: An error occurred while calling o329.loadClass.
Notice the image below of the error: