Handle cases where data provided to predict is different to reference data

ascillitoe commented 2 years ago

In some circumstances, the data provided to detector predict methods, may inadvertently be of a different dtype to the previously provided reference data. This can lead to unclear errors, for example:

x_ref = np.random.normal(size=(100,5)).astype(np.float32)
cd = MMDDrift(x_ref)
cd.predict(x_ref.astype(np.float64))

leads to:

tensorflow.python.framework.errors_impl.InvalidArgumentError: cannot compute AddV2 as input #1(zero-based) was expected to be a float tensor but is a double tensor [Op:AddV2]

To avoid unclear errors, we could handle this in a number of ways:

Attempt to cast data provided to predict to the same dtype as self.x_ref. This could potentially have unintended consequences.
Raise a descriptive warning (or error) if the dtype's don't match.

Opening this for discussion. @jklaise @arnaudvl @mauicv

jklaise commented 2 years ago

I would say we check the dtype for the predict method and if it doesn't match the reference dtype we raise a custom error. I agree that casting ourselves is potentially dangerous.

ascillitoe commented 2 years ago

Agreed. It might even be best to just raise a warning, as many detectors do still work if dtypes don't match (e.g. those without a backend). I think @arnaudvl is in favour of avoiding any explicit casting too.

jklaise commented 2 years ago

I think we should be pretty strict, at least in the case of the detectors with a backend as we know they will fail. In those cases it's better to raise a readable custom exception quickly rather than rely on logging as that would result in the unreadable error being raised anyway which is not good for downstream applications, especially one's that don't necessarily configure logging.

SeldonIO / alibi-detect

Handle cases where data provided to predict is different to reference data #448