Scan: Add a robustness detector to the scan that perturbs numerical values

Giskard-AI / giskard

🐢 Open-Source Evaluation & Testing for ML & LLM systems

https://docs.giskard.ai

Apache License 2.0

4.06k stars 266 forks source link

Scan: Add a robustness detector to the scan that perturbs numerical values #1846

Open kevinmessiaen opened 8 months ago

kevinmessiaen commented 8 months ago

🚀 Feature Request

Add a robustness detector to the scan that perturbs numerical values.

The scan should be able to a set of issues that capture the minimum amount of perturbation (lying in the bounds of the feature distribution) needed on a single numerical feature to:

(a) change the predicted label (classification)
(b) change the prediction by an amount that exceeds a certain threshold (regression)

🔈 Motivation

Currently the scan does not have any numerical perturbation.

pranavm7 commented 4 months ago

Hey @kevinmessiaen! This seems to be a duplicate of #1847 PS: I'd love to contribute to the tool! I'll be on the lookout for new issues/improvements :)

kevinmessiaen commented 4 months ago

@pranavm7 Hey, it's not exactly the same. One is for numerical values and the other is for categorical ones which differs a bit.

We would be happy to have you contribute on this tool, do you have any improvement ideas in mind?

Kranium2002 commented 1 month ago

@kevinmessiaen I would like to work on this if this is still open.

kevinmessiaen commented 1 month ago

@Kranium2002 Sure we appreciate that, I assigned you to the issue. Let me know if you have some questions or need some help!

Kranium2002 commented 1 month ago

I would be working on adding a numerical perturbation detector to test model robustness by tweaking numerical features and seeing how much the model's predictions shift by around 1 %. For classification models, it'll flag cases where the predicted label changes, and for regression, it'll detect when predictions differ beyond a threshold (like 5%). I'll integrate this into the existing framework so it reports any significant sensitivity issues. Plus, I'll build out tests to ensure it's flexible across model types and datasets.

PS: Do I make the thresholds of 1 and 5 % editable by the user or do I keep them fixed? Your thoughts? @kevinmessiaen

Kranium2002 commented 1 month ago

Working on this in #2040