Py4JJavaError: An error occurred while calling o329.loadClass.

lkashfi commented 3 years ago

Hello In the pySpark environment, I used the PCA algorithm on a data set, which results in a data frame containing an ID column and a column containing two-dimensional ordered pairs. Then I applied this DBScan algorithm to it, that is, I called the dbscan.py file in this pySpark environment. An error occurs while executing the following lines:

5 df3 = data.repartition(8) ----> 6 df3_clusters = dbscan.process(spark, df3, 1, 3, distance.euclidean, 2, "checkpoint")

The error is as follows:

Notice the image below of the error: dbscanError

SalilJain commented 3 years ago

@lkashfi did you add proper config: spark = SparkSession \ .builder \ .appName("DBSCAN") \ _.config("spark.jars.packages", "graphframes:graphframes:0.7.0-spark2.3-s2.11") \ .config('spark.driver.host', '127.0.0.1') \ .getOrCreate()

lkashfi commented 3 years ago

@lkashfi did you add proper config: spark = SparkSession .builder .appName("DBSCAN") _.config("spark.jars.packages", "graphframes:graphframes:0.7.0-spark2.3-s2.11") .config('spark.driver.host', '127.0.0.1') .getOrCreate()

yes i do

spark = SparkSession \ .builder \ .appName("DBSCAN") \ .config("spark.jars.packages", "graphframes:graphframes:0.7.0-spark2.3-s_2.11") \ .config('spark.driver.host', '127.0.0.1') \ .getOrCreate()

lkashfi commented 3 years ago

yes I did I do not know what causes this error

SalilJain commented 3 years ago

Are you sure that the jar package is loaded?

On Tue, Jul 20, 2021 at 12:13 PM lkashfi @.***> wrote:

yes I did I do not know what causes this error

— You are receiving this because you commented. Reply to this email directly, view it on GitHub https://github.com/SalilJain/pyspark_dbscan/issues/5#issuecomment-883518898, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAP2LI3ZTMAV3KX4NHCVNOTTYWODFANCNFSM5AVRCYWA .

lkashfi commented 3 years ago

dear SalilJain please see my code and tell me what is my fault Dataset: 3D_spatial_network.txt

My code in Google colab: https://colab.research.google.com/drive/1J0dv0fddc56LwamRCo-v_W2ifQ1wFzdn?usp=sharing

lkashfi commented 3 years ago

Are you sure that the jar package is loaded? … On Tue, Jul 20, 2021 at 12:13 PM lkashfi @.***> wrote: yes I did I do not know what causes this error — You are receiving this because you commented. Reply to this email directly, view it on GitHub <#5 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AAP2LI3ZTMAV3KX4NHCVNOTTYWODFANCNFSM5AVRCYWA .

I am looking forward to your answer

SalilJain commented 3 years ago

@lkashfi I couldn't figure out if it was loading the graphframes based on your code

SalilJain commented 3 years ago

@lkashfi did you try to run it on your local machine?

lkashfi commented 3 years ago

@SalilJain Hello I try to run this code by GPU in google colab or in my laptop

SalilJain commented 3 years ago

on your laptop @lkashfi

lkashfi commented 3 years ago

on your laptop @lkashfi

@SalilJain Yes First I try to run on my laptop with 8 cores but it occurred an error (I have java 11) So I try to run the code in google colab with java 8 (GPU T4 in google colab have about 40 cores)

lkashfi commented 3 years ago

on your laptop @lkashfi Dear @SalilJain Hi Please tell me what format should have the input data to this DBscan algorithm?

clevilll commented 3 years ago

dear SalilJain please see my code and tell me what is my fault Dataset: 3D_spatial_network.txt

My code in Google colab: https://colab.research.google.com/drive/1J0dv0fddc56LwamRCo-v_W2ifQ1wFzdn?usp=sharing

@lkashfi Could you run your notebook using PySpark-based dbscan offered by @SalilJain ?

SalilJain commented 3 years ago

@clevilll I wasn't able to find where you are importing the code?

SalilJain commented 2 years ago

@lkashfi see if this resolves https://colab.research.google.com/drive/1lG69SZQt6SV0E2CeS5_-nZTsmDohRXyR?usp=sharing

SalilJain / pyspark_dbscan

Py4JJavaError: An error occurred while calling o329.loadClass. #5