SeldonIO / alibi

Algorithms for explaining machine learning models
https://docs.seldon.io/projects/alibi/en/stable/
Other
2.41k stars 252 forks source link

Support more diverse input types for methods working on tabular data #516

Open jklaise opened 3 years ago

jklaise commented 3 years ago

Current status

Currently tabular data in methods such as AnchorTabular and CounterfactualProto are expected to be in one of a set of restricted formats, e.g.

Problem

If a user model is not trained on a data representation that is one of the above then Alibi tabular explainers cannot be used out-of-the-box which is undesirable (as found out by @FarrandTom).

For concreteness, denote by X an input data point that is non-compliant with the Alibi API, e.g. it could be np.ndarray but with unsupported column types, for example array([49.5, 'Male'], dtype=object) representing a numerical feature and a string-encoded categorical variable.

Further, denote by Z an input data point that is compliant with the Alibi API, e.g. array([49.5, 0. ]) representing the same numerical feature and the same but integer-encoded categorical variable.

A client may have a model M that's trained on non-compliant data, i.e. it would be of type Callable[[X], np.ndarray], whereas Alibi expects a model M_hat (prediction function) of type Callable[[Z], np.ndarray]. How can we go from a non-compliant model to a compliant one?

The key is being able to map back and forth between X and Z. Let f: X->Z be such an invertible mapping, for the example above it would be something like:

def f(X: np.ndarray, **kwargs) -> np.ndarray:  # use **kwargs for any other information needed to do the conversion
    Z_num = extract_numeric(X, **kwargs)  # extract columns like 49.5,  Z_num is now a homogenous array of numbers
    Z_cat = extract_cat(X, **kwargs)  # take columns like 'Male' and convert to 0, Z_cat is now a homogenous array of numbers
    Z = combine(Z_num, Z_cat, **kwargs)  # concatenate columns in the right order
    return Z

def f_inverse(Z: np.ndarray, **kwargs) -> np.ndarray:
    ... # do similar operations as above
    return Z

With this extra information we can define an Alibi-compliant model in terms of client model M and inverse mapping f_inv as follows: M_hat = M(f_inv(Z)), in Python

def M_hat(Z: np.ndarray) -> np.ndarray:
    X = f_inv(Z)
    pred = M(X)
    return pred

What we can do

What about deployment?

In deployment we may have the following situation where the inference graph consists of a transformer mapping an Alibi-non-compliant data X to a compliant one Z which is then passed into an Alibi-compliant model:

alt text

How could we add an explainer to this inference graph?

Point directly to the model component

If we know the model component is Alibi-compliant, we could point the explainer to that instead of the whole inference graph (which is non-compliant):

alt text

However, note that in this scenario the explainer expects the compliant data type Z whilst the inference graph operates on the original data type X. To obtain Z from X we would need to leverage the existing transformer so we could extend the inference graph like this (conceptually, implementation details may vary):

alt text

The only job of the Explainer-Transformer component is to call an existing transformer that is known (by the user) to transform non-compliant data X into compliant data Z.

Non-compliant models within an inference graph

Not all inference graphs contain a model node that would be Alibi-compliant, so in the general case the above would not work and it would be necessary to either:

Tagging a few people who may be interested in the discussion: @FarrandTom @cliveseldon @axsaucedo @SachinVarghese @arnaudvl .

ukclivecox commented 3 years ago

How could we add an explainer to this inference graph?

Point directly to the model component

This capability to point the explainer to start at a particular node in an inference pipeline is envisioned for SCV2.

Non-compliant models within an inference graph

Not all inference graphs contain a model node that would be Alibi-compliant, so in the general case the above would not work and it would be necessary to either:

* Extend Alibi compliant data types to support a wide variety of use cases / inference graphs

For the various transformation functions of this section and the one of the last I would see these probably being functions in Alibi if possible. So callables passed to the init of the explainer.

We could investigate the explainer custom resource in SCV2 being an inference graph of itself to allow images to be used which would run in separate containers. With pre post and explainer sections. Where pre has the compliant transformer. post has the compliant post-transformer, and explainer has the core explainer?

jklaise commented 3 years ago

@cliveseldon if transformation functions for common use cases are built-into alibi then we also have a choice of not having a very general API that accepts custom callables to do custom transformation and inverse transformation (although we may want to support this for genericity). Rather we could dispatch to pre-defined alibi transformations given some information about the data.

This brings me to the data aspect. In order to do this, we will need more information about the data from the user. A good example is actually #487 which is not a tabular use case, but the principle is the same. An "image" may have a channel dimension in different places or it may not have one at all (e.g. grayscale image with no explicit channel axis). These should all be valid inputs to an image predictor that's supported by AnchorImage, but the user has to tell us what the data is (e.g. in this case via image_shape and the proposed channel_axis kwargs).

If we go with the model of "ask user about data and pick an appropriate transformation function" then we will need to carefully design what it is we need to know about the data and in what format (e.g. for the AnchorImage example it's already at least 2 kwargs, for AnchorTabular it may be more - how do you express the concept of "a numpy array with strings in categorical columns"?).

If we go with the other model of "allow custom transformation and inverse transformation functions" (this is similar to alibi-detect preprocessing_fn, we would provide implementations of common ones), then it's a lot more general and we don't tie ourselves into a fixed API, the trade-off is having these extra callables "floating around" (but maybe it's not such a big issue given that we do have a similar design in alibi-detect already).

sakoush commented 3 years ago

I feel in general that we would need to start thinking about how a user can inject predict_fn, pre, post transformation to explainers as opposed to trying to have everything done in alibi. There are a few advantages of this approach in my mind:

Can tempo help with enabling the user to supply these transformation boxes?

On the other hand, if in deployment these transformation are separate pods, there is going to be extra overhead because crossing process boundary and for simple transformation this might be a significant overhead.

So we probably need to find the right balance and I personally I think we have to provide some support for both cases.

jklaise commented 3 years ago

@sakoush possibly a versatile option would be to allow alibi explainers to take in custom callables for those transformations. In that way it's still the user responsibility to define and pass these, but then the application layer doesn't have to worry about how to inject them as it would be done by alibi, you also then don't have the issue with these pieces of code ending up having to communicate over the network which I agree would likely cause big slowdowns.

jklaise commented 2 years ago

Wrt to supporting heterogenous numpy arrays, this is actually a bit of an oxymoron since numpy arrays are by definition containers of homogenous data: https://numpy.org/doc/stable/reference/arrays.ndarray.html. This is readily apparent when creating arrays containing both e.g. strings and integers, the dtype defaults to object. Another example is that data consisting of strings would default to a dtype that is roughly "the largest string entry in the array".

Given this it seems that use cases requiring heterogenous numpy arrays may be using the wrong tool for the job and likely shouldn't be seen as good practice becasue you can't do much with a heterogenous array without extra metadata and further transformations into something homogenous. A pandas dataframe would be much better suited for such heterogenous data.

jklaise commented 2 years ago

A fruitful first exploration would be to take some lessons from libraries like sklearn-pandas: https://github.com/scikit-learn-contrib/sklearn-pandas.

thorsteen commented 2 years ago

Will we one day see this issue solved by the Seldon dev team or what can we do to develop this feature?

I really need to have the flexibility to use models in a inference graph like "input-transformers --> model --> explainer -->" or input-transformers --> model ---> --> explainer

And the Seldon Deploy UI is actually put together in such a way that it look like this can be done with Seldon Deployments.

jklaise commented 2 years ago

@thorsteen thanks for following up. Just to clarify, I think there are several things going on with your use case. The one related to wiring explainers to inference graphs with transformers is more of a question about the Seldon Deploy side of things. That being said, I understand that part of the issue on the Alibi side, specifically with the data type you're using for your models? It would be great to get more context on exactly the issues you're facing on the Alibi side so we can resolve them first.

A couple of questions from my side:

  1. Have you been able to configure an Alibi explainer for your model and data type at all so far?
  2. What is the exact data format that the model you want to explain expects? You've mentioned mixed-type data, but does this mean heterogenous numpy arrays (e.g. np.ndarray([49.5, 'some_string'], dtype=object)) or something else (e.g. pandas dataframes?)
  3. Going back to the full pipeline, it would be useful to understand the exact data formats feeding into and coming out of every component to assess which (if any) parts of the pipeline would be alibi-compatible and also whether you're looking to explain single components (e.g. just the model) or parts of the whole pipeline (e.g. transformer + model). This will inform whether we can make changes to your pipeline to make it alibi-compatible as a stop-gap or inform what extensions we need to make to alibi itself to allow a broader range of input data types (this could likely be implemented as custom user-passed callables to map from the user data type to the alibi-compatible data type and back).
thorsteen commented 2 years ago

I see there are different usecases for this, but hoping solving this issue will solve my issues and make explainers more flexible in regards to data types.

What I @RafalSkolasinski @FarrandTom discovered is that one is actually not able to deploy a explainer if its getting non complaint data. Locally, I can with ndarray with object dtype if I ordinal encode my inputdata and then use OHE in a preprocessor (so I guess actually ndarray dtype is float64) run something like

predict_fn = lambda x: clf.predict(preprocessor.transform(x))
explainer = AnchorTabular(predict_fn, feature_names, categorical_names=category_map, ohe=True) 

like in your example but this cannot be deployed to production.

It would be interesting to have an explainer on the input-transformer side of things, but my issue is that if do not post only numerical / complaint explainer data to the Seldon Deployment one cannot use a explainer for the model. I would like mentioned to be able to use explainers in deployments where there is a input-transformer. Right now this is not covered for production usecases in Seldon Deploy or other deployments I would guess.

I have heard that the with Seldon V2 API, one might be able to detach the explainer from the model - is this also your understanding?

This might help solve the issue of the predict_fn because then one could do the flow "request --> input transformer --> explainer" instead of the flow which is currently "request --> explainer --> model --> explainer" in Seldon Deploy. But this would still require Alibi explainers to handle some mixed data type / ndarray obejct or a preprocessor.

jklaise commented 2 years ago

@thorsteen thanks for the context, I think I understand what's going on here but wanted to double check in the following.

Your local example with the preprocessor which is working would imply that x here is actually the alibi-compliant data and preprocessor transforms it to alibi-non-compliant data but one that works with clf.predict, i.e. I'm assuming something like this:

x = np.ndarray([[49.5, 0, 1, 0]], dtype=float)  # already pre-processed and alibi-compliant
x2 = preprocessor.transform(x)  # x2 = np.ndarray([[49.5, 'some_string']], dtype=object) - "inverse" transform, alibi-non-compliant but model-compliant
output = clf.predict(x2)  # model output as probabilities or class labels

Now what I assume you want to see in production is the original model that takes in the non-compliant x2 input which would lead to the following deployment (and failure of explainer):

/predict: request (x2) -> model(x2) -> output  # all good
/explain: request (x2) -> explainer(x2) -> model(x2) # not good as x2 is alibi-non-compliant

The general pattern to make your model alibi-compliant is to follow these docs and wrap your predict function as follows:

def predictor(x: np.ndarray) -> np.ndarray:
    x2 = transform_input(x)
    output = model(x2) # or call the model-specific prediction method
    return output

explainer = SomeExplainer(predictor, **kwargs)

Now this would work locally as you have already tested with transform_input=preprocessor.transform. But I'm guessing the blocker is that you can't deploy this wrapped model and call it a day because you would still like to deploy the original model that takes in the non-compliant x2-type data?

In that case we need a way to make explainer(x2) call work and this is where the discussion about supporting other data types comes in. One option we've been considering is to extended the alibi explainer interface as follows (pseudocode):

class SomeExplainer:
    def __init__(self, model: Callable[[Any], np.ndarray], ..., input_transform: Optional[Callable], input_transform_inverse: Optional[Callable]) -> None:
        ...
    def explain(self, X: Any) -> Explanation:
        # transform input to alibi-compliant
        X = self.input_transform(X)
        # explanation generation...
        explanation = ...
        # map explanation features in the alibi-compliant space back to the model-compliant space for interpretability
        explanation = input_transform_inverse(explanation)
        return explanation

This would extend alibi explainers to handle models taking any type of input data as a user can specify the conversion from their data to alibi compliant data via input_transform and back via input_transform_inverse.

Of course the sticking point might be that this would all work locally but if you want to separate input_transform and input_transform_inverse into separate components in the Seldon graph then we would need to re-evaluate how to wire this up properly (I think it could be done similarly to how the reset_predictor method is used to wire up the explainer to the production model when deployed).

An alternative would be to explore on the Seldon side decoupling of the explainer and black-box-model so that the deployed explainer could point to a wrapped model with the input_transform already built-in (so taking compliant x data) whilst the original model would still take in the non-compliant x2 data. This would be akin to how white-box explainers have a separate copy of the model @RafalSkolasinski @axsaucedo .

Would be great to hear your thoughts or if I've misunderstood anything.

thorsteen commented 2 years ago

Very nice mockup @jklaise, I think you understood it correctly and I like you solution. But I think it would be simpler to decouple the explainer and wrap the model inside and maybe route the request / event in a similar way to the alibi-detectors. With alibi-detectors we dont have the same problem because it is passed the request from the input-transformer. I like to have several decoupled services with a separate input-transformer, model, detector, explainer for the flexibility in the inference graph. In such a deployment, I guess one could make a universal inference graph which would look like this

                                                --> explainer   -->
request --> input-transformer                   --> model       -->               response
                                                --> detector    --> 

Locally, I would then just fit the explainer with the complaint / transformed data like I do with the model and detectors and then put this transformation in input-transformer for the Seldon deployment / production.

jklaise commented 2 years ago

@thorsteen thanks, I think fundamentally we're trying to address two things here:

  1. Make it easier for pure-alibi users to work with custom data types. My comment on adding custom input_transform and input_transform_inverse transforms straight into the alibi interface would help with this and even facilitate deployment in simple use cases.
  2. Make it easier for production users, specifically wanting to decouple/separate transformations steps out as components of the Seldon graph. The approach of 1. is then not as relevant.

What you're essentially describing is my 3rd diagram from the original post:

alt text

Conceptually, both /predict and /explain calls would take custom/non-compliant data X but for both endpoints this is routed through the input transformer so we end up with standardized/compliant data Z.

One thing that is missing from this picture, however, is that the explanation would be in terms of the Z data so not necessarily interpretable wrt to the original input X. Consider a simple case where X has an "Age" variable and set it to 49. The input transform would likely standardize/normalize it, so you would get "Age~[-1, 1]" in the Z-space. So your explanation will also be in this transformed space so hard to interpet. To make it interpretable we may need to add an "inverse transform" step that can undo the input transform (hence why I proposed one for alibi called input_transform_inverse). Coming back to deployment, this would need to be another step that is called after the explanation has been computed.

Tagging @cliveseldon @axsaucedo @RafalSkolasinski @SachinVarghese