databricks / tensorframes

[DEPRECATED] Tensorflow wrapper for DataFrames on Apache Spark
Apache License 2.0
749 stars 162 forks source link

Databricks Connect PyCharm NoClassDefFoundError #165

Open giusbi opened 5 years ago

giusbi commented 5 years ago

After having followed the documentation to connect Databricks to Pycharm, I am not able to run the sample example in https://docs.azuredatabricks.net/user-guide/dev-tools/db-connect.html#run-examples-from-your-ide car I get an error. Notice that the connection seem to work car at the beginning is checking the cluster status and is executing it; after that the error occurs on the spark command execution

19/04/24 15:07:06 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
19/04/24 15:07:08 WARN MetricsSystem: Using default name SparkStatusTracker for source because neither spark.metrics.namespace nor spark.app.id is set.
Testing simple count
19/04/24 15:07:10 WARN HTTPClient: Setting proxy configuration for HTTP client based on env var HTTPS_PROXY=https://proxy_name
19/04/24 15:07:13 WARN SparkClientManager: Cluster 1108-095209-xxx in state PENDING, waiting for it to start running...
19/04/24 15:07:24 WARN SparkClientManager: Cluster 1108-095209-xxx in state PENDING, waiting for it to start running...
19/04/24 15:07:34 WARN SparkClientManager: Cluster 1108-095209-xxx in state PENDING, waiting for it to start running...
Traceback (most recent call last):
  File "C:/Users/my_name/PycharmProjects/Databricks/main.py", line 7, in <module>
    print(spark.range(100).count())
  File "C:\Users\my_name\AppData\Local\Continuum\anaconda3\envs\dbconnect\lib\site-packages\pyspark\sql\session.py", line 337, in range
    jdf = self._jsparkSession.range(0, int(start), int(step), int(numPartitions))
  File "C:\Users\my_name\AppData\Local\Continuum\anaconda3\envs\dbconnect\lib\site-packages\py4j\java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "C:\Users\my_name\AppData\Local\Continuum\anaconda3\envs\dbconnect\lib\site-packages\pyspark\sql\utils.py", line 63, in deco
    return f(*a, **kw)
  File "C:\Users\my_name\AppData\Local\Continuum\anaconda3\envs\dbconnect\lib\site-packages\py4j\protocol.py", line 328, in get_return_value
    format(target_id, ".", name), value)
py4j.protocol.Py4JJavaError: An error occurred while calling o20.range.
: java.lang.NoClassDefFoundError: com/trueaccord/scalapb/GeneratedMessage
    at java.lang.ClassLoader.defineClass1(Native Method)
    at java.lang.ClassLoader.defineClass(Unknown Source)
    at java.security.SecureClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.defineClass(Unknown Source)
    at java.net.URLClassLoader.access$100(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.net.URLClassLoader$1.run(Unknown Source)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$buildRpc(SparkServiceRPCClientStub.scala:352)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollStatuses$1.apply(SparkServiceRPCClientStub.scala:458)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollStatuses$1.apply(SparkServiceRPCClientStub.scala:457)
    at com.databricks.spark.util.Log4jUsageLogger.recordOperation(UsageLogger.scala:161)
    at com.databricks.spark.util.UsageLogging$class.recordOperation(UsageLogger.scala:286)
    at com.databricks.service.SparkServiceRPCClientStub.recordOperation(SparkServiceRPCClientStub.scala:48)
    at com.databricks.service.SparkServiceRPCClientStub.pollStatuses(SparkServiceRPCClientStub.scala:457)
    at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$pollAndUpdateStatuses0(SparkServiceRPCClientStub.scala:428)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1$$anonfun$apply$mcV$sp$1.apply$mcV$sp(SparkServiceRPCClientStub.scala:409)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1$$anonfun$apply$mcV$sp$1.apply(SparkServiceRPCClientStub.scala:407)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1$$anonfun$apply$mcV$sp$1.apply(SparkServiceRPCClientStub.scala:407)
    at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$withPollLock(SparkServiceRPCClientStub.scala:419)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1.apply$mcV$sp(SparkServiceRPCClientStub.scala:406)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1.apply(SparkServiceRPCClientStub.scala:404)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$pollAndUpdateStatuses$1.apply(SparkServiceRPCClientStub.scala:404)
    at com.databricks.spark.util.Log4jUsageLogger.recordOperation(UsageLogger.scala:161)
    at com.databricks.spark.util.UsageLogging$class.recordOperation(UsageLogger.scala:286)
    at com.databricks.service.SparkServiceRPCClientStub.recordOperation(SparkServiceRPCClientStub.scala:48)
    at com.databricks.service.SparkServiceRPCClientStub.pollAndUpdateStatuses(SparkServiceRPCClientStub.scala:404)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$getServerHadoopConf$1.apply(SparkServiceRPCClientStub.scala:382)
    at com.databricks.service.SparkServiceRPCClientStub$$anonfun$getServerHadoopConf$1.apply(SparkServiceRPCClientStub.scala:381)
    at com.databricks.service.SparkServiceRPCClientStub.com$databricks$service$SparkServiceRPCClientStub$$withPollLock(SparkServiceRPCClientStub.scala:419)
    at com.databricks.service.SparkServiceRPCClientStub.getServerHadoopConf(SparkServiceRPCClientStub.scala:381)
    at com.databricks.service.SparkClient$.getServerHadoopConf(SparkClient.scala:211)
    at com.databricks.spark.util.SparkClientContext$.getServerHadoopConf(SparkClientContext.scala:217)
    at org.apache.spark.SparkContext$$anonfun$hadoopConfiguration$1.apply(SparkContext.scala:316)
    at org.apache.spark.SparkContext$$anonfun$hadoopConfiguration$1.apply(SparkContext.scala:311)
    at scala.util.DynamicVariable.withValue(DynamicVariable.scala:58)
    at org.apache.spark.SparkContext.hadoopConfiguration(SparkContext.scala:310)
    at org.apache.spark.sql.internal.SharedState.<init>(SharedState.scala:66)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:145)
    at org.apache.spark.sql.SparkSession$$anonfun$sharedState$1.apply(SparkSession.scala:145)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sharedState$lzycompute(SparkSession.scala:145)
    at org.apache.spark.sql.SparkSession.sharedState(SparkSession.scala:144)
    at org.apache.spark.sql.internal.BaseSessionStateBuilder.build(BaseSessionStateBuilder.scala:291)
    at org.apache.spark.sql.SparkSession$.org$apache$spark$sql$SparkSession$$instantiateSessionState(SparkSession.scala:1175)
    at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:170)
    at org.apache.spark.sql.SparkSession$$anonfun$sessionState$2.apply(SparkSession.scala:169)
    at scala.Option.getOrElse(Option.scala:121)
    at org.apache.spark.sql.SparkSession.sessionState$lzycompute(SparkSession.scala:169)
    at org.apache.spark.sql.SparkSession.sessionState(SparkSession.scala:166)
    at org.apache.spark.sql.Dataset.<init>(Dataset.scala:193)
    at org.apache.spark.sql.SparkSession.range(SparkSession.scala:609)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source)
    at java.lang.reflect.Method.invoke(Unknown Source)
    at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244)
    at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380)
    at py4j.Gateway.invoke(Gateway.java:295)
    at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132)
    at py4j.commands.CallCommand.execute(CallCommand.java:79)
    at py4j.GatewayConnection.run(GatewayConnection.java:251)
    at java.lang.Thread.run(Unknown Source)
Caused by: java.lang.ClassNotFoundException: com.trueaccord.scalapb.GeneratedMessage
    at java.net.URLClassLoader.findClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source)
    at java.lang.ClassLoader.loadClass(Unknown Source)
    ... 67 more

Process finished with exit code 1