Reading "Fairness and Machine Learning" Book

Rounique commented 2 years ago

FAIRNESS AND MACHINE LEARNING BOOK Limitations and Opportunities Solon Barocas, Moritz Hardt, Arvind Narayanan

Summary 1

Recommended by: Benjamin Fish

Hamedloghmani commented 1 year ago

Chapter #1 Summary

The first chapter of this book namely, “introduction”, tends to give a simple overview of the book and meaning of fairness. Most of the decisions that a machine learning system makes will fall into the category of evidence-based decision-making and it is believed to be only as reliable as the evidence that it is based on. In other words, if the evidence were obtained under an unfair circumstance, our decision will be unfair too. When we witness unfairness or disparities, it does not mean that the creator of the machine learning system made these inequalities arise on purpose. Two key questions should be asked in order to determine if the observed disparities should be considered as discrimination. First, whether the mentioned disparities are justified, and second, whether they are harmful. These questions do not have a simple answer most of the time since the answer lies somewhere between philosophy, sociology and computer science. For example, if a postal service system tends to be biased to odd or even ZIP codes, it is not considered harmful. But if it acts biased regarding the major race of each neighborhood, this might contribute to the long-lasting cycles of inequality. The term bias is mostly used to refer to demographic disparities in algorithmic systems that are objectionable for societal reasons. There are other traditional uses of the term bias. For example, a statistical estimator is believed to be biased it its expected or average value differs from the true value that it aims to estimate. Another notable argue is that what is our job as a machine learning system designer? Some people believe it is to faithfully reflect the data and others believe that we must question the data and the situation that it has been obtained from. The term measurement is commonly used for the process of creating a dataset from the real world. It is a misleading term evoking an image of a dispassionate scientist recording what she observes whereas creating datasets requires many subjective human decisions. The authors are concerned with the application of machine learning that involves data about people where the available training data will likely encode the demographic disparities that exist in our society. And then they argue that almost all of machine learning projects are about people, directly or indirectly. Human society is full of demographic disparities and training data will likely reflect these. The situation is measuring almost any attribute about people requires subjective decision-making. Biases in training sets target variables are especially critical because they are guaranteed to bias the predictions. The target value is believed to be the hardest form of measurement standpoint because often it is a construct that is made up from the purposes of the problem at hand rather than the one that is widely understood and measured. For example, in the context of computer vision when we want to automatically rank people’s attractiveness, all the classifiers showed a preference for lighter skin. It is believed that technology is changing very quickly, and society was slow in the process of adaptation but in this instance the categorization of scheme at the heart of today’s machine learning technology has been frozen in time while social norms have changed rapidly. Another issue is that human intuition is poor at accounting for priors, and this is a major reason that statistical predictions perform better that human in wide range of settings. But calibration means we must expect that our models reflect our data, which is a source of bias. Some patterns in the training data represent knowledge that we wish to mine using machine learning while some other patterns represent stereotypes that we might wish to avoid learning. But our algorithms have no general way to distinguish between these kinds of patterns. The first suggestion that comes to mind regarding the bias on a feature, is removing it from the dataset. But for example, if we remove gender from our dataset a lot of our features still correlate with gender. If we’re not careful, learning algorithms will generalize based on majority culture leading to high error rate for minority groups. This is because they are trying to avoid overfitting by picking up patterns that arise due to random noise rather than true difference. One way to avoid this is to explicitly model the differences between groups but there are technical and ethical challenges involved.

My own thoughts: There are major parts of this whole concept that I don’t understand. On many occasions they mix up the meaning of “equal” and “fair”. Two things might not be equal in real world, but we decide the fair way is to make them equal or at least change the distribution. But the fact that has been lost here is that our decision for the fairness of this stage was subjective at the first place. Another issue is cherry picking in terms of making things fair. For example, we argue about the gender bias in tech industry but never about making the coal mine workers gender distribution fair. And still got this unanswered question, who is qualified to decide whether something is fair or not.

hosseinfani commented 1 year ago

@Hamedloghmani I've read your summary and also had a look at the first chapter. As we discussed, it's important to know at what step of ML we can detect or mitigate the problem. Figure 1 of the first chapter is key:

1) measurement (or dataset curation/training dataset/MCAR, MAR and MNAR) 2) learning (training/model structure) 3) action (prediction) 4) feedback

hosseinfani commented 1 year ago

@Hamedloghmani the book that you upload in the msteam misses ch02. use the above link to the book

Hamedloghmani commented 1 year ago

@hosseinfani Thank you for your comments and mentioning the issue about the book. I'll use the given link. Thanks

Hamedloghmani commented 1 year ago

Chapter #2 Summary Most of the time we argue about the fairness of our automation process for different minorities and groups. But another question would be whether it is fair to deploy such an automation system or not. This is called legitimacy, whether it is morally justifiable to use machine learning or automated methods at all. There are three main forms of automation.

to automate a set of previously available rules into a software. No machine learning!
- While automating pre-existing rules, it is required to translate such scheme into code. The process might be faulty and lead to automated decisions that diverge from the policy that the software meant to execute.
- Also, the policies might lack precision in their definition and programmers might take upon themselves to make their own judgement. This type of automation requires that an institution define all of the criteria in advance for a decision-making scheme. As a result, there is no room for considering the relevance of additional details that might not have been considered or anticipated at that time.
- Finally, automation runs the serious risk of limiting accountability and exacerbating the dehumanizing effects of dealing with the bureaucracies. Simply, it makes it difficult to identify the agent responsible for a decision.
using machine learning to imitate the decisions previously made by humans. We want the machine learning system to replicate the informal judgement of humans and automatically discover a decision-making scheme that produces the same decisions.

The goal here is not necessarily to perfectly recover the specific weight that past decision makers had assigned to different features but rather to ensure the model produces a similar set of decisions made by the humans. For this method, if we can not obtain ground truth for chosen targets of prediction, there is no way to escape from human judgement.
This approach runs the obvious risk of replicating and exaggerating any objectionable qualities of human decision making by learning from the bad examples set by humans.
Also, the use of machine learning in this manner might be problematic both because it can end up being both too much like humans and too different from them.

Finally, we can learn out decision-making rules from data by using a computer to uncover patterns in the dataset that predict an outcome or property of policy interest. The important point of automation here is the process of developing the rules not necessarily applying them.

Also called predictive optimization, we first try to find an explicit target for prediction which decision makers view as their goal and once it is settled, they use data to discover which criteria to use and how to weight them in order to best predict the target. (sort of feature selection?)
Finding a proper target of prediction that is a good match for the goals of the decision makers is rarely straightforward. They usually do not have a pre-existing clear goal in mind. In fact they can have multiple conflicting goals perhaps involving some trade-offs between them. In most cases, decision makers settle on a target of convenience that is a target for which there is easily accessible data. Yet even when there are good reasons to favor a more nuanced approach, decision makers may favor imperfect simplifications of the problem because they are less costly or more tractable.
There are many times that normative issue is not with the way decision are being made but with the goal of the decision-making process itself. For example when we consider arrest numbers as the proxy for the goal of crime reduction. Our goal is the main problem, not the process of decision making since the number of arrests do not have a meaningful relationship with our goal.

It is pointed out in this chapter that bureaucracies are often criticized for not being sufficiently individualized, lumping people into needlessly coarse groups. It turns out that limitation of being insufficiently individualized is an unavoidable part of inductive reasoning. This problem is also referred to as “statistically sound but nonuniversal generalizations.” When an individual fulfills all the criteria for inclusion in a specific group but fails to posses the quality that these criteria are expected to predict.

There is also a very practical reason why they might not hold decision makers to a standard in which they are required to consider all information that might be conceivably relevant. Collecting and considering all of this information can be expensive, intrusive, and impractical. In fact, the cost of doing so could easily outweigh the perceived benefits that come from more granular decision making.

The problem of overfitting is one the main issues about induction. It is a form of arbitrary decision making because the predictive validity that serves as its justification is an illusion. For example, if we create a dataset of runners that accidentally all the fast runners wear blue shoes and slow ones wear red, the assumption of a meaningful relationship between shoe color and running skills is an illusion and arbitrary decision making basis. Besides overfitting, variants of overfitting might be problematic too. It is common to split a dataset into training and testing parts but these splits are still much more similar to other than the future population which the model might be applied. This is the problem of “distribution shift”. It is common in practice, such as the problem of image-based diagnosis of skin cancer for different racial groups. Generalizing from specific examples always admit the possibility of drawing lessons that do not apply to the situation that decision makers will experience in future.

hosseinfani commented 1 year ago

@Hamedloghmani Good job. Thank you for the summary.

Just few comments: 1- for each part/subpart, put an example. If you use only one example, would be perfect. We call it running example to showcase all parts.

2- there are some sentences that I am not sure I understood. Can you rephrase or clarify?

There are many times that normative issue is not with the way decision are being made but with the goal of the decision-making process itself.
The problem of overfitting is one the main issues about induction. It is a form of arbitrary decision making because the predictive validity that serves as its justification is an illusion.

Hamedloghmani commented 1 year ago

@hosseinfani Thanks a lot for your comments and modification of the summary. I added examples for the bullet points that you kindly mentioned in order to clarify those sentences.

fani-lab / Adila

Reading "Fairness and Machine Learning" Book #8