databricks / spark-deep-learning

Deep Learning Pipelines for Apache Spark
https://databricks.github.io/spark-deep-learning
Apache License 2.0
1.99k stars 494 forks source link

Error while importing sparkdl #228

Open jai-dewani opened 4 years ago

jai-dewani commented 4 years ago

I am using a local system to run sparkdl, installed sparkdl with the help of pip but while trying run

import sparkdl

This error was thrown out

Using TensorFlow backend.
Traceback (most recent call last):
  File "script.py", line 5, in <module>
    import sparkdl
  File "/home/iitn/anaconda3/lib/python3.7/site-packages/sparkdl/__init__.py", line 21, in <module>
    from sparkdl.estimators.text_estimator import TextEstimator, KafkaMockServer
  File "/home/iitn/anaconda3/lib/python3.7/site-packages/sparkdl/estimators/text_estimator.py", line 28, in <module>
    from kafka import KafkaConsumer
  File "/home/iitn/anaconda3/lib/python3.7/site-packages/kafka/__init__.py", line 23, in <module>
    from kafka.producer import KafkaProducer
  File "/home/iitn/anaconda3/lib/python3.7/site-packages/kafka/producer/__init__.py", line 4, in <module>
    from .simple import SimpleProducer
  File "/home/iitn/anaconda3/lib/python3.7/site-packages/kafka/producer/simple.py", line 54
    return '<SimpleProducer batch=%s>' % self.async

I think it has something to with the fact that async was introduced as a keyword in python 3.7 and you can't name a variable async. The workaround is to use kafka-python instead of kafka as mentioned in this issue https://github.com/dpkp/kafka-python/issues/1566

Python version 3.7 :: Anaconda