Open sakoush opened 3 years ago
The easiest solution would be to introduce another argument with a default value, e.g. layout='nhwc'
, this seems relevant as "image metadata" for explainers (and detectors?) that need this info @arnaudvl @ascillitoe .
Annoyingly, it seems the latest release of scikit-image==0.18.3
doesn't support a different layout other than channel-last, however in the upcoming version 0.19
this is added and can be controlled via an argument. Fixing this in the meantime would require a bit of custom reshaping.
It seems like the 0.19
release of scikit-image
should not be too far away now (https://github.com/scikit-image/scikit-image/issues/5169), so it's tempting to wait and upgrade to avoid doing array conversions ourselves just to be removed later. Otherwise we would have to reimplement the logic of this decorator to patch the skimage
segmentation functions that can only handle channel-last: https://github.com/scikit-image/scikit-image/blob/8a13f15b7629a3be643fe0eb45f6ede55ba69c84/skimage/_shared/utils.py#L219
Tthere are cases e.g. sklearn mnist that do not even have a channel dimension. Basically the image is flattened for grayscale images to 2D: (samples, features)
.
Yup, the standard response is to wrap the predictor so it's of the form that alibi
expects (i.e. in this case adding a dummy channel dimension).
In this particular case it should be easy to support this internally (e.g. check image_shape
, if 2D raise a warning and construct the wrapped predictor internally).
This is actually the same type of issue as #516, attempting to push the complexity of wrapping non-compliant predictors internally as opposed to keeping it a responsibility of the user.
Since scikit-image
0.19 has been released (#551 ), this is unblocked. A fix would then need to bump the minimum required version of scikit-image
to 0.19.
I've thought a little about this and there are two main choices, both requiring an extra argument:
image_layout: Literal['nhwc', 'nchw'] = 'nhwc'
nhwc
which is the tensorflow
default image shape, also support the other popular layout nchw
often used with torch
channel_axis: int = -1
nchw
format channel_axis
would have to be 1
(since 0
is for batch and using strings such as 'first'
and 'last'
might be confusing?)image_layout
because this would not be extensible to support more weird layouts (e.g. height and weight axes swapped around)channel_axis
is the way multi-channel images are treated in scikit-image
from version 0.19: https://github.com/scikit-image/scikit-image/releases/tag/v0.19.0I think image_layout
has a slight edge do to being more general and extensible but would be great to hear another opinion @RobertSamoilescu @ascillitoe.
EDIT: it's also possible we stick with conventions like skimage
and go the channel_axis
route, see more discussion here: https://scikit-image.org/docs/stable/user_guide/numpy_images.html#coordinate-conventions
EDIT2: in the future supporting volumetric images may require even more conventions like skimage
that could be more easily managed by image_layout
instead
EDIT3: including the batch dimension n
in image_layout
may be slightly controversial as it's not part of the image layout and also (for the time being) AnchorImage
operates only on single images. On the other hand, the implicit assumption then is that the underlying predictor
has batch as leading dimension (but this is an assumptions we've made throughout the library).
Mmmn a tricky one. I tend to agree that image_layout
is nicer, since it is more flexible and could be extended in the future. I also that channel_axis
might be confusing due to the issue with whether the batch dimension n
is included in the integer count.
Maybe image_layout
could also aid scenarios like (samples, features)
that @sakoush mentioned. If numeric values where given i.e. image_layout='n,h32,w32'
(we'd need to think about formatting rules) this would help us avoid needing to try to infer anything when reshaping the flattened feature array. Or is this getting too complicated now?!
@ascillitoe I would avoid putting in extra functionality, especially something that would be a new custom format. We already have an image_shape
argument which is also a standard concept in ML.
Yeh that's fair enough. Even without a need for the added flexibility of image_layout
right now, I agree that it is easier to interpret, and might be useful for volumetric images etc in the future. I don't have a super strong opinion though.
PyTorch example for AnchorImage
.
In
AnchorImage
the channel dimension is assumed to be the last dimension as defined here. This is not necessarily true, for example in mnist handwritten pytorch model the shape of the data is (1, 28*28).Note sure if this is applicable in other explainers as well? so the fix needs to be applied on all relevant explainers.
Lets have this as a configurable parameter at object init?