Closed noiseux1523 closed 6 years ago
any news on this? i am having the same problem
I found a workaround. I only save the last layer (logistic regression) and will rebuild the pipeline afterwards.
Save the last layer
p_model.stages[1].write().overwrite().save('lr')
Reload the model
lr_test = LogisticRegressionModel.load('./lr')
# Use a featurizer to use trained features from an existing model
featurizer_test = DeepImageFeaturizer(inputCol = "image", outputCol = "features", modelName = "InceptionV3")
# Pipeline both entities
p_test = PipelineModel(stages=[featurizer_test, lr_test])
# Test and evaluate
tested_df_test = p_test.transform(test_df)
evaluator_test = MulticlassClassificationEvaluator(metricName = "accuracy")
print("Test set accuracy = " + str(evaluator_test.evaluate(tested_df_test.select("prediction", "label"))))
tested_df_test.select("label", "probability", "prediction").show(20, False)
And everything should work fine!
Thanks @noiseux1523 this works!
I was able do load the saved pipeline correctly (not just the last layer) in scala:
val model = PipelineModel.load("/path/to/model")
As long as i had "spark-deep-learning" dependency added in pom/sbt
<dependency>
<groupId>databricks</groupId>
<artifactId>spark-deep-learning</artifactId>
<version>1.1.0-spark2.3-s_2.11</version>
</dependency>
But in python only your workaround worked.
I am still unable to make DeepImageFeaturizer transform images in a streaming job
p_model.transform(imageStream).select("probability", "prediction").writeStream.format("console").start().awaitTermination()
I am getting the following error:
pyspark.sql.utils.AnalysisException: 'Queries with streaming sources must be executed with writeStream.start();;
I see there is an open issue on the subject #136 , haven't found a workaround yet.
Thanks for reporting this & the workaround! In order to avoid confusing more users, we decided to remove this functionality in the next release: https://github.com/databricks/spark-deep-learning/pull/161 It's going to require a bit of reworking within Spark itself to provide this kind of support for ML Persistence in Spark Packages.
However, the workaround will still work since it's creating a new DeepImageFeaturizer instance when loading the Pipeline.
I'll close this issue for now, but leaving the notes on the workaround will be helpful for some users, I'm sure. Thanks all!
I followed the deep learning pipelines tutorial and my question concerns the Transfer Learning section. Everything works fine and now I want to save the trained pipelined model. The following code is working fine.
It also works fine when I evaluate.
What I try to do now is save this pipelined model and reload it to test some new data. I am doing it simply like this.
But, I get this error.
Why do I get this problem when reloading the model when it is working fine at first? I am still a beginner in Spark, sorry if this may be obvious.
Thank you