Open Hamedloghmani opened 1 year ago
Hi @hosseinfani, @mahdis-saeedi I implemented my first draft of the fair-greedy algorithm in section 2. I would really appreciate if you could kindly take a look and confirm the code does the operation mentioned in the algorithm. Thank you.
def fair_greedy(member_prob, att_list, prob_dist):
L = list(zip(member_prob, att_list))
print(L)
R = [L[0]] # Initialize R as [L1]
print(len(L))
for i in range(1, len(L)):
flag = False
p = calculate_att_dist(R)
# To address comments after meeting with Mahdis
p_diff = {False: p[False] - prob_dist[False] , True: p[True] - prob_dist[True]}
while not flag:
z_min = min(p_diff, key=p_diff.get)
# Find the first item with the underrepresented feature
for j in range(i, len(L)):
if L[j][1] == z_min: # Assuming each item in L has a 'feature' attribute
temp = L.pop(j)
# Move down the items in L to place the selected item at position i
L = L[:i] + [temp] + L[i:]
print(len(L))
R.append(temp)
flag = True
break
# To avoid infinite loop for the case there are not enough samples from the chosen protected att
if j == len(L)-1:
flag = True
break
return R
def calculate_att_dist(members):
false_ratio = [second_item for _, second_item in members].count(False) / len(members)
return {True: 1 - false_ratio, False: false_ratio}
if __name__ == "__main__":
member_prob = [0.9, 0.8, 0.7, 0.6, 0.5]
att_list = [False, False, False, True, True]
x = fair_greedy(member_prob, att_list, {False: 0.6, True: 0.4})
print(x)
Issue: In the paper I was not able to navigate the specification of calculating the difference between probability distributions. I considered absolute difference for now. The algorithm is as follows:
@Hamedloghmani It seems that absolute value of differences does not work here because d_min = 0.
@mahdis-saeedi I edited the code and removed abs() function
Hi @hosseinfani , @mahdis-saeedi In the rest of this issue page, I will log my progress regarding running and integrating the code provided by the authors of Has CEO Gender Bias Really Been Fixed? Adversarial Attacking and Improving Gender Fairness in Image Search paper in different steps. I started with running fair_greedy algorithm with 3 different settings {'Woman': 0.4, 'Man': 0.6}, {'Woman': 0.2, 'Man': 0.8} and {'Woman': 0.1, 'Man': 0.9} on their synthetic dataset with 100 individuals and 0.4 female representation. Few notes so far:
I have attached step by step traceback of these settings to this issue. output_synthetic_woman10_men90.txt output_synthetic_woman20_men80.txt output_synthetic_woman40_men60.txt
This progress is ongoing and this was only the first step.
Hi, I tried to make a successful sample run of fair_greedy on 1 fold of our imdb dataset as well by transforming the input into the acceptable form by this function.
I have attached step by step traceback of my sample run to this issue. sample_imdb_dp.txt
Hi @hosseinfani , @mahdis-saeedi ,
Since it has been a while from our last feature update, I'll go over a short summary.
In the last update, I added the baseline from Has CEO Gender Bias Really Been Fixed? Adversarial Attacking and Improving Gender Fairness in Image Search ,namely fair_greedy
to our codebase. Please note that the code we obtained from the developer has not been pushed to the repo yet due to privacy reasons.
The issue was the runtime ,that made experiments on uspt and dblp take months (if not interrupted). Hence, I started working on the optimization recently. Here is the summary so far:
fair_greedy
, it is efficient and takes les than 2-3 seconds.fairness_greedy
function and still working on that.main.py
, instead of passing the whole list of experts, we pass the best let's say 500 experts to it while k=100
since beyond that point utility is low anyways. I tried it this way and the runtime becomes reasonable. I can start running the experiments using this solution if you also think it makes sense.I would love to hear your thoughts. Thank you.
Hi, @hosseinfani @mahdis-saeedi Although this issue is not finalized and it is still under construction, the following are the options that I have came up with so far:
[x] 1) Implementing the fourth algorithm presented in Fairness-Aware Ranking in Search & Recommendation Systems with Application to LinkedIn Talent Search
I already did this option since it did not require much effort, to summarize the effect, fairness is improved, utility is dropped drastically ( it's better than other deterministic algorithms that we implemented already but still fails to hold on to utility)
[x] 2) one of the three algorithms proposed in Has CEO Gender Bias Really Been Fixed? Adversarial Attacking and Improving Gender Fairness in Image Search This paper provides three algorithms. i) Epsilon-greedy ii) Relevance-aware Swapping iii) Fair-greedy. Fair-greedy was their best algorithm based on the results but I was only able to find the implementation for Epsilon-greedy
Questions: a) Does our new baseline (e.g Epsilon-greedy) has to beat fa*ir ?