Trusted-AI / AIF360

A comprehensive set of fairness metrics for datasets and machine learning models, explanations for these metrics, and algorithms to mitigate bias in datasets and models.
https://aif360.res.ibm.com/
Apache License 2.0
2.42k stars 827 forks source link

Reconsider use of "bias" in README #97

Closed missaugustina closed 5 years ago

missaugustina commented 5 years ago

There are some issues with the way "bias" is being used in the README that are both inconsistent with what is actually possible with the tool & the current state of social science.

"The AI Fairness 360 toolkit is an open-source library to help detect and remove bias in machine learning models. The AI Fairness 360 Python package includes a comprehensive set of metrics for datasets and models to test for biases, explanations for these metrics, and algorithms to mitigate bias in datasets and models."

While it's clear that we intend Bias in this context to meant "prejudice" or a disproportionate skew towards something, we must consider the larger impacts of this perspective. First off, it is impossible to not be prejudiced. In the social sciences, we call this "subjectivity". While some disciplines treat "objectivity" as an ideal, this concept is not transferable to human behavior, which is what we are modeling in our software applications. All humans have a distinct perspective, therefore all humans are prejudiced in some way.

The data & software applications that AIF360 seeks to "de-bias" were made by people based on their assumptions about how the world works & what they think matters. A software application is one group of people's collective opinion about what their stakeholders need. The structure of the software itself is based on these people's experience & is therefore prejudiced. For example, efforts like Gendermag & A11y are helping people working in open source projects address assumptions about how their stakeholders are using their software tools at a very fundamental level.

When training data is both curated & labeled, assumptions are made by the curators & it's clear from the description of the project that this is what AIF360 can help to address. However we must also acknowledge that AIF360 is also prejudiced by the assumptions of its creators & maintainers, & therefore can neither remove bias nor de-bias.

What is "fair" & "correct" is highly situational. What is "fair" in one situation may not be "fair" in another. In some social situations, such as in Law Enforcement, the lack of fairness is indicative of larger social issues & "de-biasing" could potentially further harm disadvantaged stakeholders who have been excluded from having their own voice. We should assume such deviations on "fairness" to be the norm, not an aberration. This assumption of a universal ideal norm (or bias if you will) exists within AIF360 itself.

I suggest we reconsider that bias & prejudice are tied to something inherent in the human condition & are therefore unavoidable. Instead of a definition that dictates our own biased "Truth" through the removal of any non-conforming perspectives, I propose we re-define bias as simply "limited perspective" & re-evaluate our language & explanations from that starting point.

This issue is quite significant because our framing of "bias" presents potential harms to IBM's own credibility & intention. IBM has historically had moments where we were very narrow & short-sighted in our contribution to software projects. The global scale of our Thought-Leadership & influence meant our mistakes significantly impacted people's lives. That said, for all the harms we've done, we achieve great things too! IBM has the advantage of having a much longer-term history than other tech companies to draw upon. Given the scale of our potential impact, we have a corporate social responsibility to thoughtfully consider who is at stake when we bring new ideas into the world.

"To visualize the future of IBM, you must know something of the past" -Thomas J. Watson, Sr.

“Now, thanks to confidential corporate documents and interviews with many of the technologists involved in developing the software, The Intercept and the Investigative Fund have learned that IBM began developing this object identification technology using secret access to NYPD camera footage. With access to images of thousands of unknowing New Yorkers offered up by NYPD officials, as early as 2012, IBM was creating new search features that allow other police departments to search camera footage for images of people by hair color, facial hair, and skin tone.”

Documents - https://www.documentcloud.org/documents/4452844-IBM-SVS-Analytics-4-0-Plan-Update-for-NYPD-6.html

“In November 2006, Indiana Governor Mitch Daniels announced a 10-year, $1.16 billion contract with a consortium of tech companies, led by IBM and Affiliated Computer Services (ACS), to modernize and privatize eligibility procedures for the state’s Medicaid, food-stamp, and cash-assistance programs.”

“The design of electronic-governance systems affects our material well-being, the health of our democracy, and equity in our communities. But somehow, when we talk about data-driven government, we conveniently omit the often terrible impacts that these systems have on the poor and working-class people”

Further reading on the "bias" in software itself:

michaelhind commented 5 years ago

Thanks for sharing your thoughts. "Bias" has many meanings and it is the technical term that the AI fairness community is using (see for example: https://fatconference.org/)