Closed dataders closed 5 years ago
We have two flavors of pre-processing in AutoML, one is model independent featurization and another is model specific preprocessing. The flag "preprocess=False" disables the model independent featurization but doesn't affect the model dependent pre-processing.
ah i see now. I incorrectly assumed that the fitted_model
returned by
best_run, fitted_model = remote_run.get_output()
would be a sklearn model object, a model and a preprocessing wrapped into a Pipeline.
Is there a way to do all the necessary pre-processing outside of the experiment and create an AutoML experiment that just iterates through algorithms?
My other intention is to make force plots using shap
, which is seems already constitutes the majority of the model_explainer
functionality`
Let me find out if there is way to use AutoML pre-processing outside of AutoML and feed in the featurized data into AutoML and train over the featurized data using preprocess=False.
For the second query, you could try setting model_explainability to True for model explanations.
That would be helpful, thanks. Doing all the necessary pre-processing before running AutoML was our plan all along, but looking at the source code it seems not likely.
If it is indeed not possible, my ask is this:
A way to parse out the actual model the fitted_model
object (which is actually a Pipeline
object).
best_run, fitted_model = local_run.get_output()
The reason is that I want to create SHapley Additive exPlanation plots using the shap
package. The package cannot currently interpret output that AuoML currently provides.
The irony is that the azure_ml-sdk actually imports shap and uses it to provide explanations via the model_explainer
argument. My ask is that I can extend the explanations provided to make plots (see below).
Hi ggupta2005, you did not responded to this one:
Let me find out if there is way to use AutoML pre-processing outside of AutoML and feed in the featurized data into AutoML and train over the featurized data using preprocess=False.
@laurentiuamitroaie, I spoke with the featurization team from AutoML, I believe this is something they are considering. I'll ask the person I spoke to respond here.
Why does the pipeline still include a pre-processing script when:
preprocess = False
, andMaxAbsScaler
as an algorithm.My understanding is that LightGBM shouldn't wouldn't even benefit from scaling of numeric features.
Steps to Reproduce
When I set
preprocess = False
and blacklist the preprocessor,MaxAbsScaler
In the output experiment submission I see:MaxAbsScaler LightGBM
as the name of the pipeline.Two lines of the log file stand out to me:
More info
Params
Experiment Output
Log