Summary of an Unsuccessful Attempt with Double-cutoff Method

Hamedloghmani commented 1 year ago

Problem:

Our results in ECIR-Bias2023 (issue #34) suggest that deterministic greedy re-ranking algorithms are not successful in avoiding utility loss while maintaining fairness in different teams.

Proposed Solution:

Our proposed solutions for this was the double-cutoff method that I'll explain with an example. Suppose we need a team of 5 experts and we have the output by the size of 20 from the team formation baseline. Instead of passing 20 people to re-ranker, we first sort them by probability/score, select the top 10, then pass those to the re-ranker algorithm with k=5. We were planning to keep less related individuals out of the re-ranker's reach.

Causes of Failure

This attempt was not successful since the main cause of utility loss was sparsity of our search space compared to the search space that these algorithms were originally developed on. Det_greedy, det_cons and det_relaxed work in an iterative manner and in each iteration they select the expert with highest popularity/score from the set of experts with under represented sensitive attribute. Hence, our solution was not helpful for this issue. The sparsity problem will be more clear after finishing our analysis on other datasets and search spaces (#40)

hosseinfani commented 1 year ago

@Hamedloghmani afaik, in the pseudo code for detgreedy in the paper, they consider the prob when reranking, e.g,. L#7 in Algorithm 1. I think we only pass p/np to the reranker function. Sth is missing here.

Hamedloghmani commented 1 year ago

@hosseinfani Solid point. Thank you.

My understanding from the code documentation and paper is, they expect the passed list of p/np to be sorted based on the probability that team formation or any other ranking system gave. I double checked the reranker class and this is the doc : " item_attr: The attributes in order of each item ranked by recommendation system or search engine, from top to bottom. The values in the list can be feature index (int) or feature name (str). "

I think if they intended to consider probabilities there was no need to take the sorted list. The reranker class has also 3 attributes only, item_attr, max_na and distr. The doc for them is as follows: " distr: The disired distribution for each attribute. The values in the list can be feature index (int) or feature name (str). max_na: The maximum number of different values of the attributes in distribution. The constraint is achieved by merging the n lowest probabilities attributes. "

To summarize, I think they consider positional index in the sorted list that we pass to the re-ranker as the reversed score (since they ask for top to bottom sort) for each individual. I'll be happy to hear your thoughts on this.

hosseinfani commented 1 year ago

@Hamedloghmani Yes, if they're sorted there is no need to the actual score values. But then I don't understand the logic of your double cut-off

Hamedloghmani commented 1 year ago

@hosseinfani My logic does not actually make sense, since I did not consider the iterative greedy approach of algorithms. I simply thought removing half of least relevant experts in each team and then re-rank would help. I just documented it since you kindly suggested. At least no one would try it this way. 😅

fani-lab / Adila