Introduction:
Human biases can easily result in a skewed distribution in the training data, as many ML models are developed using human-generated data. This paper introduces a new approach to measure and mitigate unintended bias in ML models. this approach reduces the unintended bias without compromising overall model quality
Main Concern:
This paper's key contribution is the introduction of methods for quantifying and mitigating unintentional bias in text categorization algorithms.
Previous Gaps in Work:
They discuss the impact of using unfair natural language processing models for real-world tasks but do not provide mitigation strategies
Input:
This research uses a text classifier that was created to detect toxicity in Wikipedia Talk Pages comments. The model is based on a dataset of 127,820 Talk Page comments, each of which was classified as toxic or non-toxic by human raters.
(A toxic statement is described as one that is "rude, disrespectful, or unreasonable and likely to cause you to quit a conversation.")
Metric Used:
characteristic curve or AUC
*Here they make a distinction between unintended biases in a machine learning model and the algorithm's potential for unfair applications. Bias is built into every machine learning model. A model trained to detect toxic comments, for example, is designed to be biased in favor of toxic comments. The model isn't supposed to discriminate between people's genders in comments, but if it does, it's called unintentional bias. On the other hand, fairness is a term we use to describe a potential negative influence on society, particularly when different persons are treated differently.
Methodology:
By strategically adding data, it presents a simple and new strategy to mitigate that prejudice.
Convolutional neural networks were used in all versions of the model, which were trained in TensorFlow using the Keras framework.
(In order to mitigate the data imbalance which produces unintended bias, additional data has been added. For each term, to bring the toxic/non-toxic balance in line with the prior distribution for the overall dataset, they have added enough non-toxic examples. Adding these additional data was found effective in mitigating bias.)
Gaps of Work:
Automating the mining of identity words affected by unintended bias since they are made by humans in this paper.
Conclusion:
It is shown that implementing these strategies mitigates unintended biases in a model without sacrificing overall model quality or having a significant influence on the original test set.
Titile: Measuring and Mitigating Unintended Bias in Text Classification Venue: AIES Year: 2018
Introduction: Human biases can easily result in a skewed distribution in the training data, as many ML models are developed using human-generated data. This paper introduces a new approach to measure and mitigate unintended bias in ML models. this approach reduces the unintended bias without compromising overall model quality
Main Concern: This paper's key contribution is the introduction of methods for quantifying and mitigating unintentional bias in text categorization algorithms.
Previous Gaps in Work: They discuss the impact of using unfair natural language processing models for real-world tasks but do not provide mitigation strategies
Input: This research uses a text classifier that was created to detect toxicity in Wikipedia Talk Pages comments. The model is based on a dataset of 127,820 Talk Page comments, each of which was classified as toxic or non-toxic by human raters. (A toxic statement is described as one that is "rude, disrespectful, or unreasonable and likely to cause you to quit a conversation.")
Metric Used: characteristic curve or AUC
*Here they make a distinction between unintended biases in a machine learning model and the algorithm's potential for unfair applications. Bias is built into every machine learning model. A model trained to detect toxic comments, for example, is designed to be biased in favor of toxic comments. The model isn't supposed to discriminate between people's genders in comments, but if it does, it's called unintentional bias. On the other hand, fairness is a term we use to describe a potential negative influence on society, particularly when different persons are treated differently.
Methodology: By strategically adding data, it presents a simple and new strategy to mitigate that prejudice. Convolutional neural networks were used in all versions of the model, which were trained in TensorFlow using the Keras framework. (In order to mitigate the data imbalance which produces unintended bias, additional data has been added. For each term, to bring the toxic/non-toxic balance in line with the prior distribution for the overall dataset, they have added enough non-toxic examples. Adding these additional data was found effective in mitigating bias.)
Gaps of Work: Automating the mining of identity words affected by unintended bias since they are made by humans in this paper.
Conclusion: It is shown that implementing these strategies mitigates unintended biases in a model without sacrificing overall model quality or having a significant influence on the original test set.