danielenricocahall / elephas

Distributed Deep learning with Keras & Spark
MIT License
17 stars 5 forks source link

TypeError: cannot pickle 'weakref' object when running examples #16

Closed nmoran closed 1 year ago

nmoran commented 1 year ago

I have installed elephas in a clean python environment using the pip command provided in the readme, and am running into the following error when trying to get the examples running.

For the mllib_lib.py one I am starting it with

spark-submit --driver-memory 1G ./mllib_mlp.py

and getting the following error

Traceback (most recent call last):     
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 458, in dumps           
    return cloudpickle.dumps(obj, pickle_protocol)                             
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 73, in dumps 
    cp.dump(obj)                       
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 602, in dump 
    return Pickler.dump(self, obj)     
TypeError: cannot pickle 'weakref' object   
Traceback (most recent call last):     
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 458, in dumps           
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 73, in dumps 
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/cloudpickle/cloudpickle_fast.py", line 602, in dump 
TypeError: cannot pickle 'weakref' object   

During handling of the above exception, another exception occurred:            

Traceback (most recent call last):     
  File "/home/nmoran/code/elephas/elephas/examples/mllib_mlp.py", line 58, in <module>   
    spark_model.fit(lp_rdd, epochs=5, batch_size=32, verbose=0,                
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/elephas/spark_model.py", line 335, in fit                                            
    self._fit(rdd=rdd, epochs=epochs, batch_size=batch_size,                   
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/elephas/spark_model.py", line 220, in _fit                                           
    training_outcomes = rdd.mapPartitions(worker.train).collect()              
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/rdd.py", line 1197, in collect                
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/rdd.py", line 3505, in _jrdd                  
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/rdd.py", line 3362, in _wrap_function         
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/rdd.py", line 3345, in _prepare_for_python_RDD
  File "/home/nmoran/miniconda3/envs/elephas/lib/python3.8/site-packages/pyspark/python/lib/pyspark.zip/pyspark/serializers.py", line 468, in dumps           
_pickle.PicklingError: Could not serialize object: TypeError: cannot pickle 'weakref' object  
danielenricocahall commented 1 year ago

Hello @nmoran , thank you for the test! I have debugged the issue - after some testing, it looks like we are not compatible with the latest version of Tensorflow (2.11). As such, I have pinned the requirements to <= 2.10 at the moment. It should now be working - please test it out when you get the chance! It should be on the newest release, 3.4.2.

danielenricocahall commented 1 year ago

Correction: the latest is now 3.4.7 - 3.4.2 had an installation error due to a change in setup.py which I've resolved, and made some other revisions with copying the documentation to PyPI.

nmoran commented 1 year ago

Great, working now!