Closed pvchandu closed 9 months ago
@pvchandu thanks for trying the plugin and reporting the issue. I just tried out the instructions again and they are working fine for me. I'm not exactly sure what is wrong because the error is purely the spark class, not a plugin class.
Did you pick the Databricks 7.0 ML runtime? Are you using aws or azure? After you run the generate-init-script.ipynb notebook did you do step 5 to put the init.sh script into the cluster configuration and then restart the cluster?
One thing I would suggest doing is just remove the init script from the cluster configuration and make sure that starts up fine and you can run. If that works then there is probably a problem with the init script and perhaps try regenerating it.
@tgravescs, I tested this out with NC6s_v3 as mentioned in the documentation. It worked well. But, when I used NC12s_v3 or NC24s_v3 to create my cluster, this is not working. By the way I am using Azure Databricks with 7.0 ML DBR.
we don't support nodes with multiple GPUs on Databricks right now. The plugin has a restriction that each executor only has 1 GPU and it seems like the last time I tried on Databricks they did not support configuring it to have multiple executors each with 1 GPU on a multi-gpu node. normally in Apache Spark you would set spark.executor.resource.gpu.amount=1 and that would get your 1 gpu per executor but last time I tried that wasn't working on Databricks. Feel free to try to see if anything has changed there.
That makes sense now. By default, each node is on executor and we cannot change that in Databricks even today. Are there any plans to support multiple GPUs on a single node ?
we don't have any concrete plans because on any other setup you would just change it to split one node into multiple executors, I'll bring this up to others that this is limitation on Databricks.
Thanks Thomas. This is pretty limiting on databricks environment given that majority of the users are moving to databricks. I added the following feedback for databricks as well.
Will appreciate if you can collaborate with Databricks and figure this story out.
Closing this as the NoClassDefFoundError was resolved and the multiple GPUs per executor request is tracked by #1486.
I am trying out the new RAPIDS accelerator for Databricks. I am running the mortgate notebook to get started. I followed the instructions in the documentation https://nvidia.github.io/spark-rapids/docs/get-started/getting-started-with-rapids-accelerator-on-databricks.html.
When I run the code cell to read the data, it is failing with the following error.
Error: java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport
Full Error:
Py4JJavaError Traceback (most recent call last) command-1671055577733705> in module> 3 # we want a few big files instead of lots of small files 4 spark.conf.set('spark.sql.files.maxPartitionBytes', '200G') 5 acq = read_acq_csv(spark, orig_acq_path) 6 acq.repartition(12).write.parquet(tmp_acq_path, mode='overwrite') 7 perf = read_perf_csv(spark, orig_perf_path)
command-1671055577733703> in read_acq_csv(spark, path) 82 .option('delimiter', '|') \ 83 .schema(_csv_acq_schema) \ 84 .load(path) \ 85 .withColumn('quarter', _get_quarter_from_csv_file_name()) 86
/databricks/spark/python/pyspark/sql/readwriter.py in load(self, path, format, schema, options) 176 self.options(options) 177 if isinstance(path, basestring): 178 return self._df(self._jreader.load(path)) 179 elif path is not None: 180 if type(path) != list:
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/java_gateway.py in call(self, *args) 1303 answer = self.gateway_client.send_command(command) 1304 return_value = get_return_value( 1305 answer, self.gateway_client, self.target_id, self.name) 1306 1307 for temp_arg in temp_args:
/databricks/spark/python/pyspark/sql/utils.py in deco(*a, kw) 126 def deco(*a, *kw): 127 try: 128 return f(a, kw) 129 except py4j.protocol.Py4JJavaError as e: 130 converted = convert_exception(e.java_exception)
/databricks/spark/python/lib/py4j-0.10.9-src.zip/py4j/protocol.py in get_return_value(answer, gateway_client, target_id, name) 326 raise Py4JJavaError( 327 "An error occurred while calling {0}{1}{2}.\n". 328 format(target_id, ".", name), value) 329 else: 330 raise Py4JError(
Py4JJavaError: An error occurred while calling o385.load. : java.lang.NoClassDefFoundError: org/apache/spark/sql/sources/v2/ReadSupport at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.ClassLoader.defineClass1(Native Method) at java.lang.ClassLoader.defineClass(ClassLoader.java:756) at java.security.SecureClassLoader.defineClass(SecureClassLoader.java:142) at java.net.URLClassLoader.defineClass(URLClassLoader.java:468) at java.net.URLClassLoader.access$100(URLClassLoader.java:74) at java.net.URLClassLoader$1.run(URLClassLoader.java:369) at java.net.URLClassLoader$1.run(URLClassLoader.java:363) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(URLClassLoader.java:362) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at com.databricks.backend.daemon.driver.ClassLoaders$ReplWrappingClassLoader.loadClass(ClassLoaders.scala:65) at java.lang.ClassLoader.loadClass(ClassLoader.java:405) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) at java.lang.Class.forName0(Native Method) at java.lang.Class.forName(Class.java:348) at java.util.ServiceLoader$LazyIterator.nextService(ServiceLoader.java:370) at java.util.ServiceLoader$LazyIterator.access$700(ServiceLoader.java:323) at java.util.ServiceLoader$LazyIterator$2.run(ServiceLoader.java:407) at java.security.AccessController.doPrivileged(Native Method) at java.util.ServiceLoader$LazyIterator.next(ServiceLoader.java:409) at java.util.ServiceLoader$1.next(ServiceLoader.java:480) at scala.collection.convert.Wrappers$JIteratorWrapper.next(Wrappers.scala:44) at scala.collection.Iterator.foreach(Iterator.scala:941) at scala.collection.Iterator.foreach$(Iterator.scala:941) at scala.collection.AbstractIterator.foreach(Iterator.scala:1429) at scala.collection.IterableLike.foreach(IterableLike.scala:74) at scala.collection.IterableLike.foreach$(IterableLike.scala:73) at scala.collection.AbstractIterable.foreach(Iterable.scala:56) at scala.collection.TraversableLike.filterImpl(TraversableLike.scala:255) at scala.collection.TraversableLike.filterImpl$(TraversableLike.scala:249) at scala.collection.AbstractTraversable.filterImpl(Traversable.scala:108) at scala.collection.TraversableLike.filter(TraversableLike.scala:347) at scala.collection.TraversableLike.filter$(TraversableLike.scala:347) at scala.collection.AbstractTraversable.filter(Traversable.scala:108) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSource(DataSource.scala:700) at org.apache.spark.sql.execution.datasources.DataSource$.lookupDataSourceV2(DataSource.scala:784) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:317) at org.apache.spark.sql.DataFrameReader.load(DataFrameReader.scala:251) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:498) at py4j.reflection.MethodInvoker.invoke(MethodInvoker.java:244) at py4j.reflection.ReflectionEngine.invoke(ReflectionEngine.java:380) at py4j.Gateway.invoke(Gateway.java:295) at py4j.commands.AbstractCommand.invokeMethod(AbstractCommand.java:132) at py4j.commands.CallCommand.execute(CallCommand.java:79) at py4j.GatewayConnection.run(GatewayConnection.java:251) at java.lang.Thread.run(Thread.java:748) Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.sources.v2.ReadSupport at java.net.URLClassLoader.findClass(URLClassLoader.java:382) at java.lang.ClassLoader.loadClass(ClassLoader.java:418) at com.databricks.backend.daemon.driver.ClassLoaders$LibraryClassLoader.loadClass(ClassLoaders.scala:151) at java.lang.ClassLoader.loadClass(ClassLoader.java:351) ... 63 more