Open hosseinfani opened 2 years ago
@rezaBarzgar Let's move on to this issue. We can have a quick meeting to discuss this issue. But I just wanted you to self-study this and let me know what you think.
@hosseinfani I am ready to have a meeting. According to OpeNTF's paper "A neural model is an estimator for a mapping function 𝑓 from a subset of skills to a subset of experts, i.e., 𝑓(𝑠)= 𝑒.". By solving this problem, OpeNTF model is not a mapping function anymore. Also, an idea came to my mind. We can change sigmoid function's threshold in output layer. As a result, more nodes(members) will be activated. now by splitting members in k groups that each of them covers requested set of skills, more than one group will be found. If we find a proper threshold and find a reliable way for splitting members, the model can predict more than one group.
@rezaBarzgar how about tomorrow (thursday) at noon est?
@hosseinfani Unfortunately I am not available between 19:00-23:45 (IRST). But i am available before and after that time.
@rezaBarzgar Are you available now?
@hosseinfani yes I am
@rezaBarzgar send the link to your email
@hosseinfani Hi, I have opened a branch with this issue name and pushed a function that merges vectors that have similar skills. I haven't added an input switch yet. Please check it out if it is possible for you and let me know what you think.
@rezaBarzgar Thank you. I unit test and works fine. I did some minor changes. see below. Please put the revised code in ./cmn/tools.py
import numpy as np
import scipy.sparse
import copy
def merge_teams_by_skills(teamsvecs, inplace = False):
vecs = teamsvecs if inplace else copy.deepcopy(teamsvecs)
# print(f'len of matrix before operation: {len(vec["member"].rows)}')
merge_list = {}
for i in range(len(vecs['skill'].rows)):
merge_list[f'{i}'] = set()
for j in range(i + 1, len(vecs['skill'].rows)):
if vecs['skill'].rows[i] == vecs['skill'].rows[j]: merge_list[f'{i}'].add(j)
# print(colored(f'row {i} and {j} are the same!', 'red'))
if len(merge_list[f'{i}']) < 1: del merge_list[f'{i}']
delete_set = set()
for key in merge_list.keys():
for item in merge_list[key]: delete_set.add(item)
for item in delete_set:
try: del merge_list[f'{item}']
except KeyError: pass
# print(merge_list)
del_list = []
for key_ in merge_list.keys():
# print(colored(vecs['member'].getrow(int(key_)).toarray(), 'yellow'))
for value_ in merge_list[key_]:
del_list.append(value_)
vec1 = vecs['member'].getrow(int(key_))
vec2 = vecs['member'].getrow(value_)
result = np.add(vec1, vec2)
result[result != 0] = 1
vecs['member'][int(key_), :] = scipy.sparse.lil_matrix(result)
# print(colored(vecs['member'].getrow(int(key_)).toarray(), 'red'))
vecs['id'] = scipy.sparse.lil_matrix(np.delete(vecs['id'].toarray(), del_list, axis=0))
vecs['skill'] = scipy.sparse.lil_matrix(np.delete(vecs['skill'].toarray(), del_list, axis=0))
vecs['member'] = scipy.sparse.lil_matrix(np.delete(vecs['member'].toarray(), del_list, axis=0))
# print(f'len of matrix after operation: {len(vec["member"].rows)}')
return vecs
teamsvecs = {}
teamsvecs['id'] = scipy.sparse.lil_matrix([[1],[2],[3],[4],[5]])
teamsvecs['skill'] = scipy.sparse.lil_matrix([[1,1,0],[1,1,0],[0,1,1],[0,1,1],[1,1,1]])
teamsvecs['member'] = scipy.sparse.lil_matrix([[0,1,1,0],[1,1,1,0],[0,1,1,1],[0,1,1,0],[1,1,1,0]])
new_teamsvecs = merge_teams_by_skills(teamsvecs, inplace=True)
print(new_teamsvecs)
# new_vecs['id'] <= [[1], [3], [5]]
# new_vecs['skill'] <= [[1, 1, 0], [0, 1, 1], [1, 1, 1]]
# new_vecs['member'] <= [[1, 1, 1, 0], [0, 1, 1, 1], [1, 1, 1, 0]]
An important note is that we should only apply this on the test split. But in the split dictionary, for the test part, we keep the rowids. After merging on the actual matrix, we need to update the rowids as well which is daunting and have some unknown effect on the pipeline. I'm thinking of repeating the merged rows. For example:
1, [(10), (s1, s2), (e1,e5,e10)] 2, [(20), (s1, s2), (e1,e3,e15)]
==> 1, [(10), (s1, s2), (e1,e5,e10, e3, e15)] 2, [(20), (s1, s2), (e1,e5,e10, e3, e15)]
this way there is no effect I think. Can you revise to do this as an option to the function distinct=False
?
@rezaBarzgar also, just a quick reminder that also push for necessary changes per lines. For example, you ide may clean the lines and make necessary changes to other parts of the code. Please do not push them since tracking the changes for a specific bug/feature becomes difficult. Thanks.
@hosseinfani
Hi Professor. I added the function with "distinct" argument to "cnm.tools.py". now rows will not be deleted if distinct=False
.
But, I have a problem adding this function to the Test part of models. for example when I want to import the function from cnm.tools
, the following error occurs:
ModuleNotFoundError: No module named 'cnm'
@rezaBarzgar
Thanks. I think cnm
should be cmn
@hosseinfani Hi, I added the _merge_teams_byskill function to the models' test function. So in this branch (reza-branch) teams can be merged if they have equal skills. also, I couldn't find the right way to change ArgumentParser so I didn't change it. another thing to mention is that I set the default value for merging to False.
@rezaBarzgar thank you. I test and merge it soon.
@rezaBarzgar I did a unit test and there is a minor bug. See the unit test at line https://github.com/fani-lab/OpeNTF/blob/be7a12e8fcf522092aa39d3d1bce652410796f93/src/cmn/tools.py#L68
@rezaBarzgar Now, we need to see the effect on evaluations with merge and lack therof.
@hosseinfani Sure. I'll run and report the results.
@rezaBarzgar please first try on the toy datasets of dblp, imdb, uspt
Curently, we split the team instances into train and test. At test, given a team, we input the skills of the team and expect ro predict the members. However, there might be multiple teams with different members for same skills. For example,
([s1,s2], [m1,m3]) ([s1,s2], [m0,m2])
If the model predicts [m1, m3], it hits the first but misses the second.
I think we should discuss this and come up with a strategy.