SeldonIO / alibi

Algorithms for explaining machine learning models
https://docs.seldon.io/projects/alibi/en/stable/
Other
2.41k stars 252 forks source link

Are there any quantitative evaluation methods that can be used to evaluate the performance of the Anchors algorithm on Images? #637

Open krishnakripaj opened 2 years ago

krishnakripaj commented 2 years ago

What are the different quantitative evaluation metrics that can be used to evaluate the performance / accuracy etc of the anchors algorithm? I am looking to use this for an image detection task.

mauicv commented 2 years ago

Hey @krishnakripaj,

So we build Anchors themselves to capture certain performance/accuracy metrics.

The precision/threshold of an anchor is the proportion of instances contained within the Anchor that obtain the same classification. So if you sample from the anchor it's the probability that the sampled instance gets the same classification as the original instance your explaining. Anchors are generated so as to obtain a minimum precision passed as an argument to the AnchorImage class.

We also generate anchors to maximise the coverage of anchors. The coverage is the number of instances in the dataset that are contained within the anchor. In the case of images, this isn’t well defined. The issue is that image anchors are made up of super-pixels generated from the instance of interest and other data points are very unlikely to have those super-pixels so it’s hard to say what other instances are in the anchor. Instead, we generate an artificial dataset from the image and use that instead.

Anchors are quite computationally expensive especially with large numbers of features, hence why we use super-pixels (See interpretable-ml-book for discussion of runtimes). Their runtime is also highly dependent on the data and the instance of interest. For instance, anchors explaining instances next to decision boundaries may take longer to compute. We're planning an experimental exploration of the runtime considerations to hopefully give users an idea of what to expect but haven't started on it yet.

I wonder if you could share more details about your use case?