Open jeffsaremi opened 5 years ago
hey @jeffsaremi, see below some answers to your questions, let me know if it makes sense.
Is
mleap_prediction
actually uses the savedspark_prediction
?
No, spark_prediction is used strictly for the pipeline serialization.
Can these two be different? the
test_data
passed to createspark_prediction
and thetest_data
passed in the call tomleap_pipeline.transform()
?
You can transform any dataset with the deserialized pipeline, mleap_pipeline.
In the call to
serializeToBundle(
) can I just pass one single record as thetest_data
?
The transformed data frame is mostly used to extract data types and some metadata, so you could try with just one record and see how it goes.
What is the significance of the predicted data and what does
MLeap
do with that data?
The transformed data frame is used to extract data types and other metadata required for execution so that they can be serialized.
Can I serialize a model without passing any predicted data?
No, see above.
I have looked into the MLeap code and it seems that transformed Dataframe
is only used to get the schema: StructType
from it. I propose to replace requirement of passing Dataframe
with just the schema StructType
.
WDYT? @ancasarb
I don't understand the connection between the prediction results (from a call to
model.transform()
) and serialization of a model (created from a call topipeline.fit()
).Is this prediction set saved and used later when I deserialize my model as an
MLeap
pipeline? (see the code below)Is
mleap_prediction
actually uses the savedspark_prediction
?Can these two be different? the
test_data
passed to createspark_prediction
and thetest_data
passed in the call tomleap_pipeline.transform()
?In the call to
serializeToBundle(
) can I just pass one single record as thetest_data
?What is the significance of the predicted data and what does
MLeap
do with that data?Can I serialize a model without passing any predicted data?
thanks