Apply model explainability tools to the images output by similarity search

metazool commented 4 months ago

Exploration of model explainability techniques using the prediction capabilities of the CEFAS model, in complement to using it as a source of embeddings.

E.g. we take the images resulting from a similarity search of the embeddings, make predictions with the original model and look at the visual features that influenced the predictions

SHAP / LIME are the ones I'm familiar with but there's a whole toolbox in the Captum API - suggestions of approaches that worked well during development of AMI-system would be appreciated, @albags !

Do heatmaps of features in similar subsets look properly coherent?
Can this be reproduced using the CEFAS reference data, if there's a reserved test set available?
Are the prediction capabilities of the model of any immediate use to the researchers working with the FlowCam data?
In the best case, can we show the model's view of "functional traits" in a way that looks familiar and meaningful to the researchers?
Can we flush out other factors like image dimensions or object size that may be giving a false positive impression of the results of the embedding search, to gauge whether it's truly useful to put effort into self-supervised clustering of the embeddings

metazool commented 4 months ago

A quick note on this as I may not make time to finish the branch, to the extent worth doing so, this week

Initial output was a lot more inconclusive than i'd hoped for. Could be a range of reasons including

the plankton-cefas ResNet model is undercooked (is there any info about how it was trained?)
its classification mode is a poor fit for our data (unsurprisingly)
we're missing a normalisation step for the input and that's throwing things off (are there worked examples)

It's worth running the same attempted interpretations over a CEFAS plankton test set before drawing any conclusions. This seems not worth pursuing much more because using the scivision model for classification was never the intention, this was only to throw light on how and why it seems to work pretty well for feature extraction.

It's also worth going back a step, to extract and compare embeddings using different networks - using a generic ImageNet-type Resnet50 that's never specifically looked at plankton, and a default network as a sense check.

short video dataviz of occlusion output - most of the other methods i tried were even more garbled. we should expect to see much more consistency here

metazool commented 2 months ago

I was on the point of closing https://github.com/NERC-CEH/plankton_ml/pull/7 as

The results were unhelpfully inconclusive (image size, model maturity, other?)
Subsequent refactoring would now involve an overhaul of the code for work we don't particularly need

It's a useful line in the sand though. Low-priority but still actionable?

metazool commented 2 months ago

Closed this along with #7 - see comments there

NERC-CEH / plankton_ml

Apply model explainability tools to the images output by similarity search #6