Closed Hamedloghmani closed 1 year ago
based on the splits.log, we know the rowid of a test instance like
rowid = 5
at teamsvecs['skills'][5] = {set of skills for team# 5} = {s12, s15, s3}
you have to find the rows {12,15,3} (the skill column idx) of teamsvecs in skill_member:
skill_member[12] = [0, 1, 3, 0, 0, ..., 2, 0] which are members that have at least participated in a team with s12.
You find the colmunidx for non-zeros.
s12: [0, 1, 3, 0, 0, ..., 2, 0] ==> {m1, m2, ..., m{|member|-2}} s15: [2, 0, 0, 0, 0, ..., 2, 0] ==> {m0, ..., m{|member|-2}} s3: [0, 1, 0, 0, 0, ..., 0, 0] ==> {m1}
we can consider this set as the qualified set for team#5 as the intersection/union of these sets
This is my code block for the description you kindly mentioned above.
import pickle
import pandas as pd
with open('teamsvecs.pkl', 'rb') as f: teamsvecs = pickle.load(f)
teamids, skillvecs, membervecs = teamsvecs['id'], teamsvecs['skill'], teamsvecs['member']
skill_member = skillvecs.transpose() @ membervecs
popularity = pd.read_csv('popularity.csv')
ratios = list()
for i in range(skillvecs.shape[0]):
skills = skillvecs[i].rows[0]
qualified = list()
for skill in skills:
qualified.append(skill_member[skill].nonzero()[1])
intersect = set(qualified[0]).intersection(*qualified)
labels = list()
for member in intersect:
labels.append(popularity.loc[popularity['memberidx']==member, 'popularity'].tolist()[0])
ratios.append(labels.count(False) / len(intersect))
@Hamedloghmani how about this:
skill_indexes = teamsvecs['skills'][5].nonzero() or cols()
members = np.array(skill_member[skill_indexes]) ==> this raise an error: fix it please
intersect = reduce(lambda x, y: x & y, members).nonzero()
union = reduce(lambda x, y: x | y, members).nonzero()
import numpy as np
a = np.array([[1,0,0],[0,1,0],[0,0,1],[1,1,1]])
reduce(lambda x, y: x & y, a).nonzero()
reduce(lambda x, y: x | y, a).nonzero()
Hi, @hosseinfani Today I have tried multiple variation of both our implementations. The following is the most efficient implementation that I came up with. It is the combination of our codes and I tried to make it close to your coding style. I also measured the runtime of different variations to be sure. I would be happy to have your opinion on this.
ratios = list()
for i in range(skillvecs.shape[0]):
skill_indexes = skillvecs[i].nonzero()[1].tolist()
members = [skill_member[idx].nonzero()[1] for idx in skill_indexes]
intersect = set(members[0]).intersection(*members)
labels = [popularity.loc[popularity['memberidx']==member, 'popularity'].tolist()[0] for member in intersect]
ratios.append(labels.count(False) / len(intersect))
@Hamedloghmani please go ahead with results. later we'll have time for better implementations. Also, are you sure with intersection or union would be a better choice? Intersection may result in empty results, so you need to skill those that ends up with empty set of qualified members.
@hosseinfani Thanks for the feedback, my initial response somehow lost and not sent, I apologize for that.
And regarding the empty set, that's a solid point. I'm not sure yet what is the best way to handle it because even if we use logical &, we might have 0 for result. I went towards intersection since it was time efficient and also, since .nonzero() returns the results with different lengths, we required paddings. ( e.g. for skill 1 it return [ 12, 16, 43] and for skill 12 it return [12, 44, 67, 88, 95, 99]) I'll keep looking for a solution to address both of these issues.
This issue breaks down the required steps to implement equality of opportunity fairness criteria in Adila.
[x] 1. in this step we try to find a matrix that show which members had each of the skills. To do so we should do the following
teamids, skillvecs, membervecs = teamsvecs['id'], teamsvecs['skill'], teamsvecs['member']
skill_member = skillvecs.transpose() @ membervecs
inskill_member
, rows would beskills
and columns would bemembers
[x] 2. For each team, we find the required skills
[x] 3. Find the
qualified set
( set of members that have all the required skills for the team)[x] 4. Extract Popular vs Non-popular in
qualified set
and pass tore-ranker
Questions before the final implementation:
rerank
function)?