Open kevinrobinson opened 5 years ago
One other thought on this, to brainstorm a step forward. This visual from Hardt el al. (2016) keeps coming to mind as one of the main visuals I'm using to think about this across groups (versus comparing across confusion matrices or looking at differences in ROCs). It's a bit different from the idea above (see all the values that each strategy optimizes for at once), but maybe another thing to try:
I was thinking about if there was a way to include that into some part of this:
To sort of brainstorm, here's a terrible sketch of what I'm imagining as a way to maybe get at some ways to build intuition on how the choice of these optimization strategies impact different examples in the data set (the slice names don't match, but hopefully this gets the point across):
and then clicking on a point selects that slice and that strategy, and shows its confusion matrix and ROC in the side panel:
In the smiling classifier, there's not "score" so I was imagining that axis could instead either encode the continuous [0-1] "inference score." Or maybe it'd be better to build on how the what-if tool conceptualizes this now, and have that axis encode the "threshold" that each strategy would apply to each slice. That might be more domain-agnostic for binary classifiers but that's just a guess.
I realize this is more brainstorm-y than the original issue, and probably a separate panel, but since I was thinking about it I figured it was worth sharing a sketch at least of what I was thinking as I was using the performance & fairness panel, and talking about it with others trying to build intuition about how these kinds of technical conceptualizations of fairness work :)
Thanks @kevinrobinson , this is great. Our visual designer @mpushkarna has also come up with some mocks for better display of fairness optimization choices, but we have yet to implement them. Tagged her in this to bring this to her attention as well.
@jameswex Awesome! 👍 @mpushkarna I'll be excited to check out what you're working on too, and happy to brainstorm more if that's helpful sometime too.
Mahima Pushkarna Boston, MA 02115
E: mahimapushkarna@gmail.com W: www.mahimapushkarna.com
[Kindly consider the environment before printing this mail]
On Wed, May 15, 2019 at 8:51 AM Kevin Robinson notifications@github.com wrote:
@jameswex https://github.com/jameswex Awesome! 👍 @mpushkarna https://github.com/mpushkarna I'll be excited to check out what you're working on too, and happy to brainstorm more if that's helpful sometime too.
— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/PAIR-code/what-if-tool/issues/9?email_source=notifications&email_token=ACCFEKVYU5A5RZ6LP42UCY3PVQBMBA5CNFSM4HI2PZ4KYY3PNVWWK3TUL52HS4DFVREXG43VMVBW63LNMVXHJKTDN5WW2ZLOORPWSZGODVORV2A#issuecomment-492641000, or mute the thread https://github.com/notifications/unsubscribe-auth/ACCFEKVB2GWIRZVU6L4U573PVQBMBANCNFSM4HI2PZ4A .
@mpushkarna also I just ran into this talk on fairness at IO on the twitters, and I'm wondering if this flashed screenshot is related to what you're working on? It looks like it's designed to allow comparing a single metric across more subgroup slices at once:
Thanks for sharing! This is awesome, and super cool to see tools that let people now do explorations like in https://research.google.com/bigpicture/attacking-discrimination-in-ml/ with their own models or plain CSVs :)
In reading the UI, and in talking through this with other people about what's going on in the fairness optimizations, I found myself marking up screenshots to explain what was going on, like this:
I thought it might be a helpful improvement to make these connections more explicit and obvious, rather than having to parse the text definitions and map them to the UI and data points on the right. The screenshot above isn't a UI proposal, but I could sketch some options if you're interested in brainstorming. It's particularly hard to see what's being compared when you slice by more than one dimension and the confusion matrix isn't visible, so would be interesting in seeing if there's ways to make it possible to see this across say four slices. If there are other ways to look at this, that'd be awesome to learn about too! There's a lot of information and conceptual density here, so laying it out and staying simple seems like a great design challenge but also super hard :)
Relatedly, if I'm understanding right, for some of the choices the actual metric being optimized isn't visible anywhere at all (putting aside the cost ratio altogether for now). So for single threshold, as an example, I believe the number being optimized is the overall accuracy, the aggregation of these two numbers weighted by the count of examples:
So in this case I'm trying to see how much the overall accuracy goes down when trying different optimization strategies that will all bring it down as they trade off other goals (eg, equal opportunity). These questions may just from me exploring the impact of different parameters to build intuition, but the kind of question I'm trying to ask is "how much worse is the metric that equal opportunity is optimizing for overall, when I choose demographic parity?" and "how much worse is the metric for equal opportunity for each slice when I choose demographic parity?" Not sure if I'm making any sense, but essentially trying to compare how one optimization choice impacts the other optimization choices' metrics.
Thanks!