databricks / spark-deep-learning

Deep Learning Pipelines for Apache Spark
https://databricks.github.io/spark-deep-learning
Apache License 2.0
1.99k stars 494 forks source link

How to map human readable labels to images? Especially when save the model for use in CoreML? #159

Open heng2j opened 5 years ago

heng2j commented 5 years ago

Hi everyone,

I am very new to spark-deep-learning just tried out the transfer learning tutorial.

When I tried to use string as the label for the my the classes, I got an error that "label" column can only take int as defined in the ImageSchema.

tulips_df = ImageSchema.readImages(img_dir + "/tulips").withColumn("label", lit('tulips')) daisy_df = imageIO.readImagesWithCustomFn(img_dir + "/daisy", decode_f=imageIO.PIL_decode).withColumn("label", lit('daisy'))

Since I am going to train my customized inception 3 model and will convert the model into a CoreML model to be use on iOS so user can classify certain objects with their phone. I am wondering is there any other steps I need to do to achieve the user readable labels for the image classification model that trained with sparkdl?

And is there any instructions for how to save the trained model to be use somewhere else?

Greatly appreciated.

Thank you, Heng

innat commented 5 years ago

One thing is very important while we train a machine learning model is that our defined label (supervised) need to be numeric. So, in these circumstances, we need to represent a string label with some machine trainable numeric parameter.

Saving a ML model and Deep Neural model. All the methods are applicable in pyspark too.