Open kevinmessiaen opened 5 months ago
Hi @kevinmessiaen ,
I noticed this issue and would like to contribute to it. Is it still open and relevant? If so, I would appreciate any guidance or additional information that could help me get started.
Thank you!
Hello @abhibongale
Yes the issue is still relevant, we would appreciate your contribution on this one!
Basically in the Scanner
(giskard.scanner.scanner.py
) we run a bunch of evaluators depending of the model type.
For the regression
and classification
models, the detectors will be using the SliceFinder
(giskard.slicing.slice_finder.py
) to generate some slices that will then be tested. Some of those slices might be overlapping (ei. We can have a slice for the car
sub-category that is inside the slice for the transportation
category). This is fine since the dataset might have issue for only one of those categories.
However we can have some cases where the whole transportation category contains an issue (meaning that the car
and other sub-categories would also contains this issue). That's why we want to filter the sub-slices from the scan report in order to improve it.
I think you can start by having a look at the PerformanceDetector
(giskard.scanner.performance.performance_bias_detector.py
)
🚀 Feature Request
If a slice is completely contained into another slice, we should just report the biggest one.
🔈 Motivation
It will makes the scan report more concise and avoid duplication. Furthermore it takes time and memory to check for those sub slices and it doesn't really provide any value.