kevinrobinson commented 4 years ago

Thanks for sharing such awesome work in the open! 👍 This issue is broadening the discussion a little bit from https://github.com/fairlearn/fairlearn/issues/300#issuecomment-587514778 and https://github.com/fairlearn/fairlearn/issues/238#issuecomment-587132194. If this is too forward-looking or abstract, and GitHub is the wrong place please let me know :)

As someone who is reasonably familiar with these ML fairness concepts, I was super excited to explore how fairlearn could support folks in discovering fairness issues across different metrics. 👍 The central design critique that came up for me was that it might be promising to explore UX ideas that more directly support exploratory analysis (eg, show me the red flags that come up across any metric all at once, then let me get more details). That's based on a key assumption that folks won't be able to narrow to one sensitive attribute and one metric up front, and that many fairlearn users will be new to fairness concepts, audits and visualizations.

That might be a bad assumption, but it does come up in the awesome Improving Fairness in Machine Learning Systems:What Do Industry Practitioners Need? (Holstein et al. 2019). For example, it seems like there might be a need for supporting folks in evaluating different fairness metrics and choosing what's most appropriate for their scenario:

Most of our interviewees’ teams do not currently have fairness metrics against which they can monitor performance and progress.

Some of the other findings that seem particularly relevant in critiquing and thinking through what directions might be promising directions for UX improvements in fairlearn. I added some bolding:

Several interviewees expressed needs for support in detecting biases and unfairness prior to deployment, even in cases where they may not have anticipated all relevant subpopulations or all kinds of fairness issues

And even more directly:

“It’s a little bit of a... manual search to say, ‘hey, we think this has a bias, let’s go take a look and see if it does,’ which I don’t know is the right approach [...]because there are a lot of strange ones that you wouldn’t expect [...] that we just accidentally stumbled upon.”

Both of these seem to indicate there might be a need in doing exploratory analysis, where the design of the tool helps exploring across multiple sensitive features, combinations of them, and multiple error or fairness metrics. Given that, it's possible that UX changes to add more steps to the wizard might make this 'manual search' process more challenging, which is one scenario that led me to ask these kinds of broader questions.

More generally, I'm curious about how folks are thinking about some of the other opportunities that fairlearn might have to tackle higher-level user needs more directly. I'm super excited about work aimed at tackling what has come up in the Holstein et al. (2019) paper, and the generative possibilities for UX and design work that's considering those needs directly. One intuition I have is that this work might involve treating different types of ML problems and different types of fairness issues in less abstracted and more differentiated ways, both in UX and implementation.

I'll be curious to hear what folks think! And since I don't know y'all very well, it's probably worth saying I have much respect for your work, and thanks for sharing in the open and doing so much to advance the field! 👍

MiroDudik commented 4 years ago

All good points! You're right that this might be too much for a single github issue. But--on a high level--your comments bring up the need to create a venue for brainstorming and prioritizing various high-level aspects of the project. Until we come up with a better alternative, maybe we can keep this issue open.

The most immediate plan around UX is to make it a bit more modular, so it's easier to experiment with different variants of the wizard and to enable the kinds of flows you're thinking about.

kevinrobinson commented 4 years ago

Sure, sounds good! 👍

Collaborating on design or larger experiments is always harder from the outside, but if there are ways you're interested in sharing thoughts on that, or sketching some mockups or workflows, I'd be curious to chat more!

One small step might be sharing thoughts on the relative prioritization is for the different types of ML methods (eg, would it be good scoping to focus on folks with binary classification problems to start, or would that not really applicable for users' actual problems).

romanlutz commented 4 years ago

Thanks for the suggestions, @kevinrobinson ! We want to make this as open as possible, so I'll try and find a place for everyone in the community to collaborate on designs. @adrinjalali educated me on how scikit-learn does that using an "enhancement-proposals" repository, for example.

kevinrobinson commented 4 years ago

@romanlutz Sure! I see https://github.com/fairlearn/fairlearn-proposals/pulls now as well, and also see @MiroDudik alluding to this in https://github.com/fairlearn/fairlearn/issues/311#issuecomment-592716247:

we hope that the development of UX for fairness assessment / unfairness mitigation will be our key area of contribution.

I'll look forward to places to follow along and maybe join in and contribute! 👍

kevinrobinson commented 4 years ago

I'm sure folks on the team are aware of this, but for others following along, some other great work in this area is in Co-Designing Checklists to Understand OrganizationalChallenges and Opportunities around Fairness in AI (Madaio et al. 2020).

In particular, I wonder what it might look like for projects that embraced particular sociotechnical contexts, and built out tools for those particular contexts? In other words, building on the observation in Madaio et al. (2020) that:

In particular, we found that there are gaps in existing UX research methods for explicitly engaging diverse stakeholders around AI fairness. Although participants described existing methods for user testing, current UX research methods provide little guidance on how to solicit input and concerns from stake-holders belonging to different groups, especially when some groups have substantially less power or influence than others.For example, a UX researcher working on a predictive policing system might solicit feedback from the police—i.e., the intended users of the system—but fail to engage with the communities most likely to be affected by the system’s use.

One minimal starting point might be in tooling that first helped users with identifying groups of people who might be most negatively impacted by say maximizing for accuracy. The goal isn't just to constraint the optimization or mitigate that unfairness for subgroups that the user of fairlearn is already aware of - it's to help them with gaining awareness of what those might even be.

I wonder about what directions this kind of framing might lead to, where the challenge is expanding the scope of the problem for users to include wider sociotechnical context. To brainstorm that one step further, what would it look like to create tools that embodied values like Design Justice Principles into how it conceptualized the sociotechnical work of fairness in ML? To make that more concrete within fairlearn, what would a fairlearn notebook on COMPAS look like if it included within its scope the kinds of wider sociotechnical context you see in talks like Lum, Bender and Wilkerson (2018)? These are aspirational questions, but as a research direction, exploring these kinds of questions seem super generative as compared to other projects that might aim to minimize or abstract the sociotechnical context away. More tactically, within fairlearn it might be interesting to learn if folks have a sense of the various sociotechnical contexts where it's most in use now? That might yield ideas for where new kinds of UX could add new kinds of value.

Anyway, I'm just excited about this work so brainstorming and thinking aloud :) Thanks for listening and I definitely understand the tension between shipping incremental value for folks in the short term and longer-term aspirations or research questions like this! 👍

romanlutz commented 4 years ago

@hannawallach is actually already part of this effort!

[One disclaimer on the COMPAS notebook: I would definitely point out that it's one of the least refined ones we have, and I've considered taking it down multiple times since the scenario is just more difficult than most other scenarios. The sociotechnical frame there would have to include judges and how they make their decisions, not just the scores since they're not the actual outcome. You can easily criticize more about the approach, but IMO that's one of the more important ones. The benefit of that notebook is that it is more straightforward than the ranking example. I suppose the solution would be to find another scenario that's more straightforward to replace the COMPAS one.]

I'll let others comment on the direction for automatically identifying groups/alerting modelers of disadvantaged groups since I'm currently more on the mitigation side. @mesameki @MiroDudik @vingu @riedgar-ms @rihorn2

About the contexts Fairlearn is used in: Right now it's pretty generic, and you could certainly imagine applications in all kinds of areas. Whether we want to go as far as building context-specific UX is another great question about the direction. I think that very much depends on whether users/modelers ask for it. @mesameki may have a much better idea wrt this!

I appreciate all your thoughts, and please don't hesitate to voice them! The entire point of this tool/community/project/effort is to build something that's useful, and it can only be if we listen and understand. Thanks a lot!

MiroDudik commented 4 years ago

I love this thread, but I think that this is a better venue for it!

kevinrobinson commented 4 years ago

hi fairlearn pals! :)

From the outside, it sounds like there have been a few ways folks have asked about contributing to the project, which seems great! And lots of great progress with discussions being open in GitHub, Teams, etc. I've watched the repo and seen other issues like https://github.com/fairlearn/fairlearn/issues/311#issuecomment-592716247, https://github.com/fairlearn/fairlearn/issues/406#issuecomment-623597208 and that you're having an open developer call later this week. I have much respect for the efforts in trying to do this kind of work more openly! 👍

At the same time, it's hard to understand if there's a real opportunity to collaborate here, particularly on work that embraces the fundamentally sociotechnical nature of tools, visualizations and algorithms related to fairness. There's so much awesome work talking about the scope we could be aspiring to here (eg, Selbst et al. 2019; Seo Jo and Gebru 2019), and I'm hopeful that folks working in this repo might be open to collaborations exploring those kinds of directions.

I separately opened https://github.com/koaning/scikit-fairness/issues/31 as an example of a more specific and scoped proposal. I know that it can be challenging to collaborate through open source, or with outside contributors in general. I'm wondering if a productive step might be for folks at MS to describe what a productive outside collaboration might look like, from your perspective? I'm asking this as a way to respect your perspective and let you control that process, rather than trying to push the work onto you :)

Alternately, if exploring the sociotechnical aspect of this work is beyond the scope of the repo, I can politely close this issue for now. And as always, thanks for sharing so much of your work in the open! 👍

MiroDudik commented 4 years ago

Sorry for dropping the ball on this issue. Frankly, we don't have much experience in collaborative creation of components such as UX flow and case studies, so we've been postponing that part... but we should start moving.

These are a few issues that I'd like to get some clarity about:

how can we make the UX more modular? do we need to?
how does one do collaborative UX design?
how do we set up authoring system around case studies?
- the easy part: the format should some .py variant of jupyter notebooks, so it plays nicely with github
- something to discuss: do we want to have explicit notebook authors (at least sometimes?); citation format? (at least sometimes?)
- the hard part: how do we ensure quality? our capacity to curate (at least at the beginning) is limited, so how do we prioritize?

For now, I'd like to at least get a sense of what would be a pragmatic solution for the small project of our size. And in particular--where our current contributors are most interested in helping right now. Maybe we can discuss these on the call this week. @kevinrobinson : will you be joining?

adrinjalali commented 4 years ago

the hard part: how do we ensure quality? our capacity to curate (at least at the beginning) is limited, so how do we prioritize?

On this particular issue, I'd say having a diverse set of contributors who feel at home in the project, and together reaching consensus on each issue, would improve the quality of an open source project overall. It may make things a bit slower, but in my experience it ends up being a not too bad of a result :)

MiroDudik commented 4 years ago

I think that would be a great way to go... there's a second-order concern how to create and nurture a diverse community of developers (not so easy actually), but we should give it a shot.

kevinrobinson commented 4 years ago

@MiroDudik Thanks! This is super helpful.

Some important things I heard here are:

we don't have much experience in collaborative creation of components such as UX flow and case studies... how does one do collaborative UX design?

So it sounds like the first priority would be figuring out what will feel comfortable for the current team :) I think a helpful first step would be checking that I'm aligned with you all about what's important to this project and what success feels like. There are bits of discussion of this around the repo in doc, threads and proposals, and I've gone through these to try to make sure I am understanding as best I can. I think this is in the same spirit as https://github.com/fairlearn/fairlearn-proposals/pull/8, where concerns about curation and quality also come up). But I'm also really trying to help with pulling together some other things like the project roadmap, what will feel like success to the team, etc. I think it's very hard to talk productively about new UX work without being aligned on those things :)

So one concrete step to move us forward could be to gather a one-page Markdown file (project-vision.md), submit that as a pull request, and then leave that open for a week or two so everyone can comment and make sure it's reflecting the team's collective vision. Then the team can merge that into the README, docs, or wherever folks think is best. After that, I can propose some specific projects at different scopes within the vision for the project as separate PRs, and interested folks can collectively critique how well they'd fit the project vision and whether they want to collaborate.

Here's an demo of what I mean, on a fork.

I'm happy to kick that off, would it be helpful for me to submit a similar PR to this repo or in the proposal repo? If so, I'll do that and close this thread. EDIT: also @MiroDudik yep I can listen in on the call Thursday, and also would love to avoid taking up too much of the group's time :)

MiroDudik commented 4 years ago

This is awesome! I think you pulled in the important points really well. Join the call tomorrow and let's follow up. I think that for this kind of a high-level mission document, we can keep it in the main repository under docs--good idea to iterate on it for a week or two and then integrate it into the project website / enhanced docs, which we should roll out in a few weeks (with some early prototypes earlier). See #8 .

kevinrobinson commented 4 years ago

@romanlutz @MiroDudik Got it, thanks! I need a little bit of help though, to figure out what you are asking me to do next in terms of process :) I am cool with any of these paths, what do you think would be most helpful?

a. wait until https://github.com/fairlearn/fairlearn-proposals/pull/8 lands b. make these kinds of comments on https://github.com/fairlearn/fairlearn-proposals/pull/8 c. submit a PR on top of https://github.com/fairlearn/fairlearn-proposals/pull/8 d. submit a new PR to the repo with this kind of project-vision doc (what was proposed above)

Separately, as another process note @romanlutz has left a bunch of thoughtful comments in my repo on https://github.com/kevinrobinson/fairlearn/pull/1 (sorry for the mixup!). I've closed that PR since I don't want great conversation about the vision and purpose of this project to be happening in a PR to my personal fork :) I'll manually move those comments to the main repo based on what you all think is the best next process step, a-d above or something else :)

Thanks! 👍

romanlutz commented 4 years ago

8 is more of a process over the next couple of weeks at least. If you look in the main repo I've started some of the tasks for that already because they're not controversial. The finer points of the structure are still a little fluid, so that's where I could use your input on where you think the vision fits in.

For that reason I'd say option (b), and I think that will result in (d) with a slightly altered destination (not thinkaloud.md but rather docs/vision.rst or something like that)

Looking forward to talking tomorrow, and if this takes longer than the little time we can allot to it we'll just have a follow-up. I'm happy to spend time iterating on this since it matters. Same applies to the UX.

kevinrobinson commented 4 years ago

👍 left some comments in https://github.com/fairlearn/fairlearn-proposals/pull/8#pullrequestreview-406913154, and will close this and wait to see how that progresses.

fairlearn / fairlearn-proposals

Thoughts on: What Do Industry Practitioners Need? #2