AutoViML / Auto_ViML

Automatically Build Multiple ML Models with a Single Line of Code. Created by Ram Seshadri. Collaborators Welcome. Permission Granted upon Request.
Apache License 2.0
518 stars 101 forks source link

How to reuse the trained model? #10

Closed manugarri closed 4 years ago

manugarri commented 4 years ago

I find that the Auto_Viml main function is great when you have both the train and test datasets at the same time. This is good for kaggle, but not for real world operations where the inference is done after the model has been trained.

I see that the output of the main function is a trained model, and the train and test datasets with the required features (this is not even true btw, the testm and trainm dont have the same output).

However, the trained model is not a pipeline, but a simple model (logisticregression in a vanilla run on the titanic dataset.).

Would it be possible to actually export a pipeline that can perform inference in a dataset with the same features as the original training one?

AutoViML commented 4 years ago

Thanks for your trying out Auto_ViML. I appreciate your suggestions. Let me address your questions:

  1. "(this is not even true btw, the testm and trainm dont have the same output)." -> The trainm and testm have the set of features that have been transformed by Auto_ViML. They have slightly different columns since testm has "predictions" columns while trainm doesn't. Otherwise they have the exact same features as defined by the "features" output.

  2. "However, the trained model is not a pipeline, but a simple model (logisticregression in a vanilla run on the titanic dataset.)." -> The output is a Trained model on your train data set which you can use to predict on your testm data set right away using the features selected in the output variable called "feats" or "features". I don't know why you are getting a vanilla model. Please test this again or give me a colab link to test your assertion.

  3. "Would it be possible to actually export a pipeline that can perform inference in a dataset with the same features as the original training one?" -> I am working on a pipeline but since the number of transformations that Auto_ViML does is not exactly found in any library, it is taking a little longer than expected. Watch the Github for an announcement soon.