dmmiller612 / sparktorch

Train and run Pytorch models on Apache Spark.
MIT License
339 stars 44 forks source link

Issue with spark_model fit #35

Open kvinshorts opened 1 year ago

kvinshorts commented 1 year ago

Hey, I am trying to use this library to train a binary classifier over a spark dataframe.In that, I keep getting a worker node failed error due to no module named sparktorch found, although I have successfully installed sparktorch library using pip.This is the error I receive:

Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.collectAndServe. : org.apache.spark.SparkException: Job aborted due to stage failure: Could not recover from a failed barrier ResultStage. Most recent failure reason: Stage failed because barrier task ResultTask(1, 19) finished unsuccessfully. org.apache.spark.api.python.PythonException: Traceback (most recent call last): File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 586, in main func, profiler, deserializer, serializer = read_command(pickleSer, infile) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/worker.py", line 71, in read_command command = serializer.loads(command.value) File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/serializers.py", line 430, in loads return pickle.loads(obj, encoding=encoding) ModuleNotFoundError: No module named 'sparktorch'

if anyone can help me out, please respond