Introduction
Given a repository of items and a user information need, present items to help the user satisfy that need.
Fairness: The ways a system treats people, or groups of people by some moral, legal, or ethical standard.
Bias: fact of the system without making any inherently normative judgment
Accountability: held accountable, usually for the human effects of their systems
Transparency: make operation and results of systems scrutable to stakeholders so that it can be understood reviewed, and contested
Safety
Privacy: duty to protect information from harmful disclosure
Ethics: Adhere to ACM code of ethics
IMP: News News search and recommendation influences user exposure to news articles on social media, news aggregation applications, and search engines. Such influence extends to social and political choices users might make. Additionally, the filter bubble effect may cause users to be exposed primarily to news items that reinforce their beliefs and increase polarization. Depending on the journalistic policy of the provider, news platforms may want to facilitate balanced exposure to news from across the social, political, and cultural spectrum, but this may need to be balanced with the need to de-rank malicious and low-credibility sources.
Specific fairness concerns in news discovery include:
• Does the system provide fair exposure to news on different topics or affected groups?
• Do journalists from different perspectives receive fair visibility or exposure for their con- tent?
• Doesthesystemrewardoriginalinvestigatorsorprimarilydirectreaderstotertiarysources?
• Do users receive a balanced set of news content?
• Are users in different demographics or locations equally well-served by their news recommendations?
CHAP 2 - Information access systems
Item representation: three parts: content, metadata, and usage data. content representation of an item d element of D as φc(d). metadata of an item expresses information about the content (learned representation or author/time etc). usage data about an item is the historic interaction between information needs and the item.
All needs - Q (q (individual needs) element of Q)
session is multiple queries for one need
Situated evaluation places the algorithm or system in front of real users, operating the system in the exact environment in which it will be deployed.
Simulated evaluation uses data and algorithms to create a controlled, repeatable simulation of user behavior and metrics.
Offline evaluation, including off-policy evaluation, is an example of this approach.
Item Utility - explicit labels (given by a human), implicit labels (based on user feedback - may depend on mood - cannot be long run)
analytic evaluation metrics involve an inner product between the vector of item utilities at each rank and a rank-discount factor
algorithms often work by estimate the utility of the document to an information need through a scoring function
Algorithmic foundation:
• What data about needs and items is used, and how is it represented?
• Is utility directly estimated or learned through optimization?
• For what objective are utility estimates optimized?
• How are utility estimates used to produce the final ranking?
Vector space model (TO INCLUDE IN PPT?) One important similarity between the document-term matrix and the ratings matrix is that they are both sparse and incomplete. - TF-IDF, collaborative filtering, content based filtering
Embedding and Optimizing utility - eg: SVD, ALS, SGD. fundamental operation is to learn a function s(d|q), that estimates the item’s relevance to the given need q based on observations, such as search result clicks, purchases, or product ratings
User Modeling: User embedding computed by a suitable statistical model, that is then used by the final scoring and ranking logic in order to estimate the relevance of an item to the user’s information need in accordance with their personal preferences.
Learning to Rank - eg: Bayesian Personalized Ranking
Re-ranking - One application of re-ranking is to improve the diversity of results. Maximum marginal relevance (MMR) adjusts the ranking to balance, at each position, maximizing s(d|q)with minimizing the similarity between the new item and previous items
CHAP 3 - Fairness fundamentals
Universal fairness is not achievable
Individual fairness is concerned with treating similar individuals similarly
group fairness is concerned with identifying and addressing differences between groups of data subjects.
The construct feature space (CFS) contains the ‘true’ features that we would use to make decisions in an ideal system, such as the applicant’s ability to repay a loan or the job candidate’s ability to carry out the duties of a position
The CFS is unobservable; instead, we have access to the observation feature space (OFS), which is the result of an observation process that results in the input features for the actual decision process
World itself is unfair
Data collection
Models
Evaluation
Human response
Problems and concepts:
Who is experiencing unfairness?
How does that unfairness manifest?
How is that unfairness determined or measured?
Distributional harms arise when someone is denied a resource or benefit; unfairly denying loans, for example, to a group of people.
Representational harms arise when the system represents groups or individuals incorrectly, either in its internal representation
Disparate treatment is when members of different groups are intentionally treated differently
The plaintiff shows the challenged practice has disparate impact.
The defendant shows a legitimate business purpose for the practice.
The plaintiff shows a less discriminatory mechanism that would achieve the business purpose.
Recall parity, sometimes called equality of opportunity, ensures that members of different groups are equally likely to receive a favorable positive decision conditioned on positive outcome
Error parity, sometimes called disparate mistreatment, ensures different groups do not experience erroneous decisions at different rates, conditioned on their true outcomes
This takes a couple of flavors; we can look at predictive value parity in the decision process, and require that decisions for each group have the same positive predictive value
We can also look at calibration parity, requiring that the underlying scores are equally well- calibrated for each group
meritocratic fairness, which prohibits the system from preferring a less-qualified candidate over a more-qualified one
De-biasing data:
• Suppressing sensitive attributes or attributes correlated to the sensitive attributes; this can reduce discrimination in downstream tasks in some cases.
• “Massaging” the data by altering class labels from negative to positive for sensitive groups and vice versa until discrimination is minimized
• Re-weighting the data by carefully assigning weights to certain inputs to reduce discrimination.
• Stratifiedsamplingstrategiestorepeatorskipsamplestoreducediscriminationselectively.
• Imposing fairness constraints on the representations may lead to non-discriminatory output in the downstream tasks
• regularization: incorporating one or more fairness objectives penalty terms to the loss function to discourage unfair models.
• constrained optimization approaches, fairness is formulated as a constraint on parts of the confusion matrix at training time
• Adversarial learning can also be applied with an adversarial model attempting to identify unfairness in the primary model’s outputs
• One technique is through thresholding, i.e, using different decision bound- aries for different groups to ensure non-discriminatory outcome under some definition
CHAP 4 - Problem space
First, decisions are not independent, so they cannot be made or evaluated separately.
Second, decisions are repeated over time, a violation of the simultaneous evaluation requirement.
Third, decisions are personalized to users.
Fourth, outcomes are subjective.
Fifth, multiple stakeholders have fairness concerns.
Multistakeholder fairness - consumer, provider (exposure concern), information subjects (retrieved items), side-stakeholders (uber eats drivers), joint fairness, cross group harms
An information access system can cause direct representational harms when it presents inaccurate information about items (e.g misgendering)
Unfair result set composition: unfairness in the composition of its result sets and rankings.(CEO search results)
• Reinforcing stereotypes (of users, content, providers, subjects, or any combination)
• Presenting an inaccurate picture of the information space
• Biasing users’ sense of the possibilities of the information space
Unfair distribution of benefits - subtractability - does one person’s use of the resource affect the ability of others to enjoy it?
Fair from What Vantage Point? beyond the distribution of quantitative exposure or utility.
Fair on What Time Scale? The repeated nature of information access and its evolution over time, particularly as the system learns and updates its models in response to user interactions, means that point- in-time analysis is not sufficient to fully understand the fairness-related behavior of the system. (offline evaluation is a one time thing)
Fairness and the System Pipeline
Item understanding can affect both producer fairness and the ability to correctly locate documents to meet an information need.
User understanding most directly affects consumer fairness. 1) Retrieval and rendering, often including ranking, are central to the observable output of the information access system. 2) Behavior understanding is how the system improves itself, either through automatic learn- ing or feedback to system designers.
Evaluation of information access systems, as discussed in section 2.5, is inherently different from the evaluation of classification systems.
Fairness and Other Concerns:
Fairness may be in tension with accuracy or utility in some cases or experimental settings, but more research is needed to more fully understand and predict their relationship. Fairness has significant overlap or complementarity to other concerns for information access, such as diversity and popularity bias.
Contributing Back to ML Fairness - avoiding abstraction traps, information retrieval, data mitigation
CHAP 5 - Consumer fairness
Consumer fairness is concerned with how an information access system impacts consumers and sub-groups of consumers, and whether those effects are fair or result in unjust harms.
Group fairness: Assess with utility-based evaluation that is used to evaluate the system’s effectiveness, such as an offline accuracy evaluation or an online A/B test, and disaggregating utility by consumer groups.
Evaluation done in the same way to determine accuracy but with one group at a time to see if it acts fairly
Reasons for disparity:
measurement invariance
availability of training data,
Item relevance
mediating factors
Providing fair utility: Detecting and quantifying inequitable distributions of system utility is one thing; correcting them is another.
re-ranking recommendations to improve their fairness properties.
CHAP 6 - Provider Fairness
Diversity in recommendation and search results is mainly focused on consumer intent, intending to present results that meet a wide range of users’ topical needs. In contrast, provider fairness is motivated by justice concerns to ensure that different providers receive fair opportunity for their content or products to be discovered.
Many constructs for provider fairness are concerned in some way with representation: are the providers of items returned representative of the broader population, or some other reference distribution of provider groups? how the system represents the space of providers to the user. -- Representational Harm
operationalized through a distribution over provider groups.
• A multinomial target distribution P(target) over provider groups G
• A distance function ∆ that computes the distance between two distributions over provider
groups
• A means of computing group distributions Pπ from the list and comparing them to the target distribution
KL divergence measures the fairness in the distribution of two groups -- this just sees if the items from different providers are considered
Ranking plays an imp role as well
Check the front of the line first (prefix method):
We look at small parts of the line starting from the front. it might not be fair to the other groups to have first 5 from one provider. keep testing these small parts of the line to see if they’re balanced. If they’re not, call the line unfair and fix it.
Make items further down the line less important (discount method):
Let’s say items at the front are the most imp. As we go further down the line, the items matter less. To do this, we give smaller points to items further back. This way, even if imp items are packed at the end, it doesn’t matter as much because those items don’t count as heavily.
population estimator, assuming that the goal is for the providers in a ranking to be representative of the broader population from which they are drawn. AKA fair distribution, it depends on the domain
• Uniform
• The overall population of item providers
• The set of providers of items at least marginally relevant to the information need
• An estimate of the distribution in society at large
One common way to provide representational group fairness is through re-ranking. In binary-group settings, a greedy approach that selects the best item from the original ranking that does not violate the fairness constraint or make representation worse
Under some theories of equity, such as anti-subordination, this is expected and acceptable
A more recent line of fair ranking constructs shifts this discussion in four important ways:
• Assuming that measures of relevance produced by an information access system are good proxies for the value of an item to a user, such that the inclusion of a high-scoring item is worth more to the provider, as well as to the user. -- exposure construct
• Directly measuring exposure (or attention) as a resource that the system should distribute fairly among providers. -- browsing model
• Relating provider-side utility, abstracted through exposure, to consumer-side utility.
• Measuring fairness over repeated or stochastic rankings, rather than a fixed ranking in response to a single information need. --browsing model
the fairness metric can be satisfied with the inclusion of irrelevant protected group items, which are unlikely to attract user interest.
Individually fair exposure
expected exposure: Think of this as guessing how often people might see an item based on how imp and where it is ranked
comparing to a fair target: We create a “target exposure” — a fair way to display the items so they get an equal chance, based on their skills. Then we see how far the real exposure is from this fair target.
expected exposure loss: if there is less clicks on one of them we measure the difference (as a squared value) to show how unfair the system is.
expected exposure disparity (EED) - a measure of how equally exposure is distributed among documents regardless of their relevance, a measure similar discounted metric.
Because attention and relevance are aggregated separately, however, a system can be fair by providing the correct exposure to items, but exposing them on the wrong queries.
Group fair exposure
describe group-based aggregations of their amortized attention and expected exposure metrics; as presented, these consist of aggregating attention and relevance (for amortized attention) or exposure (for expected exposure) by provider group before computing the loss metric
Unfairness can the be computed with the squared difference in group-wise exposure between system and target exposure, or absolute difference between group exposure and relevance.
offline approximation of click-through rate, this metric is closer to the measuring the distribution of actual user engagement instead of just exposure to users that may lead to engagement.
Fair exposure
If the system is unfair, we can fix it by reordering results or tweaking how it works.
Some methods swap items around to reduce unfairness, while others use math (like linear programming) to balance attention between groups.
Others make sure new or less-popular items also get noticed, especially when there isn’t enough data about them yet.
Re-ranking Strategies:
Biega et al. (2018) use optimization to fairly distribute attention.
Gómez et al. (2021) make minimal swaps to fix unfair exposure.
Stochastic Rankings:
Singh & Joachims (2018, 2019) use linear programming to create fair randomized rankings and balance relevance with fairness.
Learning Fair Exposure:
Diaz et al. (2020) minimize expected exposure loss as
Fair accuracy and pair wise fairness
The system is fair if it doesn’t favor one group over another in correctly ranking relevant items.
Pairwise accuracy measures whether the system ranks an item 𝑑′ higher than 𝑑 when 𝑑′ is more relevant.
Intra-Group Fairness: Accuracy should be equal for ranking items within a single group (e.g., older vs. older candidates).
Inter-Group Fairness: Accuracy should be equal for ranking items from different groups (e.g., older vs. younger candidates).
Pairwise accuracy only requires sampling data, making it potentially more efficient and easier to apply than exposure-based metrics, especially in cases with incomplete data.
CHAP 7 - Dynamic Fairness
One of the most troubling aspects of algorithmic bias generally is the potential for destructive positive feedback loops within the system. Not just for one user but for the future users of this system too
It is also very difficult for new entrants to break into a market with positive feedback effects since they would have to gain traction against well-entrenched competition. Recommender systems therefore tend naturally towards unfairness, a tendency that a fairness-aware recommender system will need to continuously counter.
Bias amplification - machine learning algorithm
Instead we need to incorporate the cycle of user arrival, recommendation generation, user response, and periodic system re-training.
Feedback loop evaluation: multidimensional fairness using probabilistic social choice to control subgroup fairness over time - In this model, deviations from fairness observed in a particular time window are addressed by adjusting the system’s fairness objectives over the next batch of recommendations produced.
CHAP 8 - Future
Extending the concepts and methods of fair information access research to additional domains, applications, problem framings, and axes of fairness concerns.
Deeper study of the development and evolution of biases over time.
Define and study further fairness concerns beyond consumer and provider fairness.
Study human desires for and response to fairness interventions in information access
Develop appropriate metrics for information access fairness, along with thorough understand- ing of the requirements and behavior of fairness metrics and best practices for applying them in practical situations
Develop standards and best practices for information access data and model provenance.
Engage more deeply with the multidimensional and complex nature of bias.
Participatory design and research in information access.
Michael D. Ekstrand, Anubrata Das, Robin Burke, Fernando Diaz