fani-lab / Adila

Fairness-Aware Team Formation
3 stars 2 forks source link

2017:SSDBM:Measuring Fairness in Ranked Outputs #73

Open Hamedloghmani opened 1 year ago

Hamedloghmani commented 1 year ago

Title: Measuring Fairness in Ranked Outputs Year: 2017 Venue: SSDBM

My notes: One of the main challenges that I faced during experimenting on fairness in ranking was lack of proper metric to measure fairness. To the best of my knowledge, the following paper proposed three metrics to measure fairness specifically for ranking. Although 6 years have passed since this paper, still I have not seen a fairness metric for ranking become common in literature except NDKL which is also the contribution of this paper.

Main Problem:
This paper addresses the impact of algorithmic decisions that involve scoring and ranking individuals. These decisions have significant consequences, as they determine outcomes such as credit worthiness, college admissions etc. The paper focuses specifically on situations where an institution, referred to as a "ranker," assesses a set of items, typically individuals, based on different characteristics such as demographics or behaviors. The ranker generates a ranking that represents the relative quality of these items. However, despite the seemingly objective nature of automated rankers, they can exhibit discriminatory behavior against individuals and systematically disadvantage protected groups. Moreover, lack of diversity can be observed in higher-ranking positions. Given these concerns, the authors emphasizes the need to study the fairness of ranking schemes. Fairness is broadly defined as the impartial and just treatment of individuals and demographic groups. While algorithmic fairness has gained increasing attention in literature, and numerous interpretations and technical definitions have been proposed, the authors note a lack of work quantifying fairness specifically in the context of ranking schemes. Specifically, they claim "despite a recent spike of interest in fairness, and despite the ubiquity and the importance of ranking, we are not aware of work that quantifies fairness in a ranking scheme". Therefore, the main objective of the paper is to address this research gap and provide a metric of fairness in ranking schemes.

Proposed Method: Their approach to fairness measures is based on some concept: Since it is more advantageous for an item to have a higher rank, achieving statistical parity at higher ranks becomes more crucial. To incorporate this idea, they adopt several established statistical parity measures proposed in existing literature and integrate them into the nDCG framework. Specifically, they compute set-wise parity at multiple cutoff points within a ranking and progressively aggregate these values using a position-based discount. To keep this summary short and readable, I skip NDKL since we already know the details about that metric.

Normalized discounted difference (rND): " computes the difference in the proportion of members of the protected group at top-i and in the over-all population." rnd

in the formula, Z is the normalization variable to determine the highest possible value for the metric (typically they keep it between 0 and 1). S+ is the protected group ( for example non-popular group in our initial experiments) and N is size of the whole set. They experimented with these metrics on a synthetic dataset to have control over the unfairness of the data over the experiment. They made some observations about this metric that I'll summarize: the protected and non-protected groups were treated symmetrically. It means low proportion of both protected or non-protected group at high ranks leads to a high (unfair) rND score. rND is convex and continuous but is not differentiable at 0.

Drawback: While the metric is convex and continuous, it is not differentiable, limiting its usefulness in an optimization framework.

Normalized discounted ratio (rRD): This metric is formulated similarly to rND, with the difference in the denominator of the fractions. rrd

In the first fraction they use |S-| which is size of non-protected samples in the given range (1 to i) as the denominator and in the second fraction they used size of the whole set of non-protected group as the denominator. They mentioned: " When either the numerator or the denominator of a fraction is 0, we set the value of the fraction to 0." Observation summary: They observe that rRD reaches its best (lowest) value at the same points as do rND and NDKL( aka rKL), but that it shows different trends. Since rRD does not treat protected and non-protected groups symmetrically, its behavior when protected group represents the majority of the over-all population or when protected group is preferred to non-protected group(fairness probability > 0.5), is not meaningful which is a negative point.

Drawback: This metric is only applicable when the protected group(e.g non-popular group) is the minority group.

The following is the plot of all three metrics on the synthetic dataset: plot

Codebase: https://github.com/DataResponsibly/FairRank/