2022:ACM: Fairness in Ranking, Part II: Learning-to-Rank and Recommender Systems (UNDER CONSTRUCTION)

The second part of this survey focuses on fairness in learning based methods and recommender systems. Since the paper is already too dense, I'll try to extract key definitions and concepts here for myself and others.

What is the difference between score-based and LtR rankers ? It is in how the score is obtained in score-based ranking, a function is given to calculate the scores Y, while in supervised learning, the ranking function fˆ is learned from a set of training examples and the score Yˆ is estimated.
We are usually interested in NDCG at the top-k (denoted NDCGk ), and so normalize the position-discounted gain of the top-k in the predicted ranking by the position-discounted gain of the top-k in the ideal ranking.
Average Precision (MAP). consists of several parts: first, precision at position k (P@k) is calculated as the proportion of query-relevant candidates in the top-k positions of the predicted ranking ˆ τ. This proportion is computed for all positions in ˆ τ, and then averaged by the number of relevant candidates for a given query to compute average precision (AP). Finally, MAP is calculated as the mean of AP values across all queries. MAP enables a performance comparison between models irrespective of the number of queries that were given at training time.
Two main lines of work on measuring fairness in rankings, and enacting fairness-enhancing interventions, have emerged over the past several years: probability-based and exposure-based. Both interpret fairness as a requirement to provide a predefined share of visibility for one or more protected groups throughout a ranking.

Probability-based fairness is defined by means of statistical significance tests that ask how likely it is that a given ranking was created by a fair process, such as by tossing a coin to decide whether to put a protected-group or a privileged-group candidate at position i.
Exposure-based fairness is defined by quantifying the expected attention received by a candidate, or a group of candidates, typically by comparing their average position bias to that of other candidates or groups.

The algorithmic fairness community is familiar with the distinction between individual fairness, a requirement that individuals who are similar with respect to a task are treated similarly by the algorithmic process, and group fairness, a requirement that outcomes of the algorithmic process be in some sense equalized across groups. Probability-based fairness definitions are designed to express strict group fairness goals. Thus, they do not allow later compensation for unfairness in higher ranking positions, since a ranking has to pass the statistical significance test at every position to be declared fair. If a ranking fails the fairness test at any point, it is immediately declared unfair, in contrast to exposure-based definitions.
There are three types of bias:

Pre-existing biases appear in any data collection procedure in various ways. It is the way we interrogate, the decision which information we collect and which not, and so on.
Technical bias makes its way into data as rounding errors, different number and category encoding or the strategy choice how to handle missing values.
Emergent bias arises when data is used in a different way than intended during collection.

General advantages of pre-processing methods:

Pre-processing methods consider fairness as first concern in the machine learning pipeline.
Most in- and post-processing methods rely on the availability of group labels during or after training, respectively. Pre processing approaches instead commonly operate on a distance measure between individuals which allows to be agnostic to group membership. It is sufficient to define who should be similar to whom, based on the features that are available.
Additionally it is possible to control for certain types of fairness across groups, even if only sparse information about group membership is available

General disadvantages are as follows:

Machine learning methods that rely on a separate feature engineering step are not applicable because the features identified by domain experts may be rendered meaningless, if fair representations are learned from the raw data.
Current methods only operationalize individual fairness and treat group fairness as a special case of it.

fani-lab / Adila

2022:ACM: Fairness in Ranking, Part II: Learning-to-Rank and Recommender Systems (UNDER CONSTRUCTION) #79