cerndb / dist-keras

Distributed Deep Learning, with a focus on distributed training, using Keras and Apache Spark.
http://joerihermans.com/work/distributed-keras/
GNU General Public License v3.0
624 stars 169 forks source link

mapPartitions #2

Closed elenacuoco closed 8 years ago

elenacuoco commented 8 years ago

I found the following error while trying to run your example

Traceback (most recent call last):
  File "/home/dist-keras/examples/single_trainer_example.py", line 78, in <module>
    dataset = labelVectorTransformer.transform(dataset).toDF().select("features_normalized", "label_index", "label")
  File "/home/dist-keras/distkeras/distributed.py", line 54, in transform
    return data.mapPartitions(self._transform)
  File "/home/spark/python/pyspark/sql/dataframe.py", line 844, in __getattr__
    "'%s' object has no attribute '%s'" % (self.__class__.__name__, name))
AttributeError: 'DataFrame' object has no attribute 'mapPartitions'
JoeriHermans commented 8 years ago

Did you modify the example in any way? I'll look into this.

Edit: I found the issue, I will write a patch later this evening. Thanks for reporting this.

elenacuoco commented 8 years ago

Thanks!

JoeriHermans commented 8 years ago

Hi @elenacuoco

Can you confirm if it works for you now? I pushed the changes to the master branch.

elenacuoco commented 8 years ago

Now I found this error from distkeras.distributed import DPGO ImportError: cannot import name DPGO

Maybe is my local problem. I will check better tomorrow morning

JoeriHermans commented 8 years ago

No, it is a problem on my side, I removed DPGO in the development branched. Sorry for the inconvenience. Need to check my commits better. Sorry.

I removed DPGO from the examples as well. Note that you are running the SingleTrainer, which is not a distributed algorithm, EASGD is an implementation which does this.

elenacuoco commented 8 years ago

Hi, I run again following the code lines for single trainer and I found now model = trainer.train(trainingSet) File "build/bdist.linux-x86_64/egg/distkeras/distributed.py", line 147, in train NameError: global name 'SingleTrainerWorker' is not defined

I checked your code and that function is missing in distributed.py

JoeriHermans commented 8 years ago

Ok, it is fixed now. I accidentally removed that worker when removing DPGO.