Reverse feature transformation when explaining models

MattiaGallegati commented 2 years ago

Hello everyone, I'm quite new to xAI and Seldon Alibi. Hope my question is not too naive: if I train an explainer on a training dataset that received some transformations on data (like Scaling of age column between 0 and 1) is it possible to have explainability on original data and not on transformed ones when invoking the explainer? In my age examples I want to see the real values for the age column (and not 0-1) so I have to revert the transformation in a sense. Is this possible for every transformation?

Thank you.

mauicv commented 2 years ago

Hey @MattiaGallegati, Thanks for opening the issue. Can you be more specific, what kind of model are you using and what kind of explainer do you use?

Broadly speaking It depends on whether or not the explainer is black or white box. If the method doesn't need access to the internals of the model then you can wrap the transformation in the model prediction. For instance, if you are using AnchorTabular for a model trained on normalized numerical data then passing:

predict_fn = lambda x: model.predict(scaler.transform(x))

To the explainer with the untransformed dataset will mean the explanation will be given in terms of the untransformed dataset. This is done here for example.

On the other hand, If the explainer method requires access to the model then the model has to be passed directly. In this case, you should pass the model along with the transformed data to the explainer. Later, if the transformation is invertible you can map back. For example here we pass the model (a tensorflow neural network) and fit the explainer using the transformed data:

explainer = CounterfactualProto(
    model,                           # The model to explain
    shape=(1,) + X_train.shape[1:],  # shape of the model input
    ae_model=ae,                     # The autoencoder
    enc_model=ae.encoder             # The encoder
)

explainer.fit(scaler.transform(X_train)) # Fit the explainer with scaled data

When we then explain an instance we first need to transform it like so:

x_norm = scaler.transform(x)
result_proto = explainer.explain(x_norm)

We can then reverse transform the explaination using:

proto_cf = result_proto.data['cf']['X']
proto_cf = scaler.inverse_transform(proto_cf)

So as you can see it's quite dependent on the model and explainer. If you share more details I can give you more insight.

MattiaGallegati commented 2 years ago

Hello @mauicv, thank you for the detailed response!! Unfortunatelly I can't be more specific in this particular case. I'm trying to figure out a "standardized" way to build an xAI pipeline that gives back explanation on non-transformed data, indipendently from the algorithm choosen or the processing steps taken. In my particular use case I have a complex pipeline (orchestrated via Airflow/Argo or other orchestrator) that preprocess data and applies a lot of transformations on input data. Inside the pipeline, suppose I will have scaling, encoding and other transformation and also generation of new features. The "fit" of the explainer will be on heavily preprocessed data. The thing here is that if I want to go back I will have to revert a sequence of transformations. Do you have any suggestion on that? Maybe I should train the explainer on "non processed" data?

mauicv commented 2 years ago

Hey @MattiaGallegati,

Why can you not revert the sequence of transformations?

mauicv commented 2 years ago

Fitting the explainer on non-processed data requires the model has been trained on the non-processed data. If that's not the case then you have to be able to invert the pipeline if you want interpretable explanations.

SeldonIO / alibi

Reverse feature transformation when explaining models #667