I'm developing a neural network using Pytorch in a non-databricks cluster to ensure its functionality prior migrating to a databricks cluster.
Since I'm using Pytorch, I don't need Keras or TensorFlow. I installed successfully Horovod and Sparkdl, however, when I try to run the Spark process I found (for now) three consecutive exceptions related to missing dependencies:
from sparkdl import HorovodRunner
File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/__init__.py", line 17, in <module>
from sparkdl.transformers.keras_image import KerasImageFileTransformer
File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/keras_image.py", line 16, in <module>
import keras.backend as K
File "/opt/conda/default/lib/python3.8/site-packages/keras/__init__.py", line 21, in <module>
from tensorflow.python import tf2
ModuleNotFoundError: No module named 'tensorflow'
from sparkdl import HorovodRunner
File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/__init__.py", line 17, in <module>
from sparkdl.transformers.keras_image import KerasImageFileTransformer
File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/keras_image.py", line 16, in <module>
import keras.backend as K
ModuleNotFoundError: No module named 'keras'
This one is DEPRECATED!!:
from sparkdl import HorovodRunner
File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/__init__.py", line 17, in <module>
from sparkdl.transformers.keras_image import KerasImageFileTransformer
File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/keras_image.py", line 27, in <module>
from sparkdl.transformers.tf_image import TFImageTransformer
File "/opt/conda/default/lib/python3.8/site-packages/sparkdl/transformers/tf_image.py", line 18, in <module>
import tensorframes as tfs
ModuleNotFoundError: No module named 'tensorframes'
On one hand, I don't understand why should I need these dependencies if I'm not going to use them... Shouldn't it be checked and disabled instead of forcing it to be installed?
On the other hand, if those dependencies are unavoidable, they should be included in the setup.py script to avoid having these errors and losing time, since installing Horovod packages in an ephemeral cluster takes a lot of time just to discover that you cannot run the program...
I'm sure I won't have a problem in a Databricks cluster, but I cannot use it yet and that shouldn't be a problem to test HorovodRunner functionality as stated in the warning message when running a program in a non-databricks cluster...
Hi,
I'm developing a neural network using Pytorch in a non-databricks cluster to ensure its functionality prior migrating to a databricks cluster.
Since I'm using Pytorch, I don't need Keras or TensorFlow. I installed successfully Horovod and Sparkdl, however, when I try to run the Spark process I found (for now) three consecutive exceptions related to missing dependencies:
This one is DEPRECATED!!:
On one hand, I don't understand why should I need these dependencies if I'm not going to use them... Shouldn't it be checked and disabled instead of forcing it to be installed?
On the other hand, if those dependencies are unavoidable, they should be included in the setup.py script to avoid having these errors and losing time, since installing Horovod packages in an ephemeral cluster takes a lot of time just to discover that you cannot run the program...
I'm sure I won't have a problem in a Databricks cluster, but I cannot use it yet and that shouldn't be a problem to test HorovodRunner functionality as stated in the warning message when running a program in a non-databricks cluster...
Kind regards