Interpreting Semantic Segmentations

varungupta31 commented 2 years ago

Hey, Is it possible to run GradCam visuals on a semantic segmentation model using this library? If yes, kindly help me out with how I can proceed.

Specifically, I have trained a FCN segmentation model using VGG-19 as the backbone, in Keras, and have the trained model, and wish to interpret the segmentation behaviour using gradcam/gradcam plus etc. (my model, has only two categories, background and foreground)

Any help is appreciated, Thank you.

fel-thomas commented 2 years ago

Hello @varungupta31, ;)

thank you for taking the time to write an issue. This is indeed an problem currently being addressed by the team. I have personally tested to explain a semantic segmentation and it works, you have to do 1 things: add a flatten layer at the end of your model.

Then you can just pass in Y a one-hot vector of the output we want to study.
I give an example: if your output is a 4x4x2 matrix (width = 4, height = 4 and 2 channels: foreground and background) then by passing Y the matrix having a 1 at the coordinates [0,0,0] you should have an explanation. Don't forget to also flatten the Y after. The meaning of the explanation is then: what are the pixels in my original image that pushed me to classify the top left pixel in class 0. Of course you can pass an Y with several 1 (e.g., a 4x4 matrix filled with 1's for channel 0). The interpretation would be something like what is the evidence that led me to segment the class 0 in the whole image.

We also plan to release a notebook and a wrapper to help with this particular case before the end of the year (we also plan to make bbox working).

varungupta31 commented 2 years ago

@fel-thomas Thank you so much for taking time to guide me with this.

Some of the explanation that you provided were a bit unclear to me, like:

Don't forget to also flatten the Y after.

However, before your help, I was not really sure what the 2 channels in my segmentation output meant (how one channel represents background, and the other, the foreground). Then I re-visited my prediction code and was able to verify that the model was basically picking up the max(channel 1, channel 2) and then deciding wether the pixel belonged to foreground or background. With this understanding. I was finally able to run gradcam! At the moment, I'm running it on a few extreme pixels, plus the center pixel, and averaging the heatmaps.

We also plan to release a notebook and a wrapper to help with this particular case before the end of the year (we also plan to make bbox working).

Please do, it would indeed be very helpful, as I have realized not much work is available for segmentation interpretability, especially in a plug-and-play format like yours.

P.s. - I know Mohit Vaishnav and he recommended me to check out this amazing repo (thanks to him as well!) :)

deel-ai / xplique

Interpreting Semantic Segmentations #104