I was try trying to train one of my own build model on a gpu cluster using pyspark. With smaller sample the training was successful. But when I run it for 30954 images I am getting the following error:
# that does not run for 30954 images
paramGrid = (
ParamGridBuilder()
.addGrid(estimator.kerasFitParams, [{"batch_size": 16, "verbose": 0},
{"batch_size": 32, "verbose": 0}])
.build()
)
mc = BinaryClassificationEvaluator(rawPredictionCol="prediction", labelCol="label" )
cv = CrossValidator(estimator=estimator, estimatorParamMaps=paramGrid, evaluator=mc, numFolds=2)
cvModel = cv.fit(train_df)
...
...
INFO:tensorflow:Froze 0 variables.
Converted 0 variables to const ops.
Traceback (most recent call last):
File "/databricks/spark/python/pyspark/broadcast.py", line 83, in dump
pickle.dump(value, f, 2)
OverflowError: cannot serialize a string larger than 4GiB
PicklingError: Could not serialize broadcast: OverflowError: cannot serialize a string larger than 4GiB
But when running with the same big sample and using transfer learning and logistic regression it runs.
# that runs for 30954 images
featurizer = DeepImageFeaturizer(inputCol="image", outputCol="features", modelName="InceptionV3")
lr = LogisticRegression(maxIter=20, regParam=0.05, elasticNetParam=0.3, labelCol="label")
p = Pipeline(stages=[featurizer, lr])
p_model = p.fit(train_df)
I was try trying to train one of my own build model on a gpu cluster using pyspark. With smaller sample the training was successful. But when I run it for 30954 images I am getting the following error:
... ... INFO:tensorflow:Froze 0 variables. Converted 0 variables to const ops. Traceback (most recent call last): File "/databricks/spark/python/pyspark/broadcast.py", line 83, in dump pickle.dump(value, f, 2) OverflowError: cannot serialize a string larger than 4GiB PicklingError: Could not serialize broadcast: OverflowError: cannot serialize a string larger than 4GiB
But when running with the same big sample and using transfer learning and logistic regression it runs.