Evalution on team predictions for test instances

hosseinfani commented 2 years ago

Curently, we split the team instances into train and test. At test, given a team, we input the skills of the team and expect ro predict the members. However, there might be multiple teams with different members for same skills. For example,

([s1,s2], [m1,m3]) ([s1,s2], [m0,m2])

If the model predicts [m1, m3], it hits the first but misses the second.

I think we should discuss this and come up with a strategy.

hosseinfani commented 1 year ago

@rezaBarzgar Let's move on to this issue. We can have a quick meeting to discuss this issue. But I just wanted you to self-study this and let me know what you think.

rezaBarzgar commented 1 year ago

@hosseinfani I am ready to have a meeting. According to OpeNTF's paper "A neural model is an estimator for a mapping function 𝑓 from a subset of skills to a subset of experts, i.e., 𝑓(𝑠)= 𝑒.". By solving this problem, OpeNTF model is not a mapping function anymore. Also, an idea came to my mind. We can change sigmoid function's threshold in output layer. As a result, more nodes(members) will be activated. now by splitting members in k groups that each of them covers requested set of skills, more than one group will be found. If we find a proper threshold and find a reliable way for splitting members, the model can predict more than one group.

hosseinfani commented 1 year ago

@rezaBarzgar how about tomorrow (thursday) at noon est?

rezaBarzgar commented 1 year ago

@hosseinfani Unfortunately I am not available between 19:00-23:45 (IRST). But i am available before and after that time.

hosseinfani commented 1 year ago

@rezaBarzgar Are you available now?

rezaBarzgar commented 1 year ago

@hosseinfani yes I am

hosseinfani commented 1 year ago

@rezaBarzgar send the link to your email

hosseinfani commented 1 year ago

rezaBarzgar commented 1 year ago

@hosseinfani Hi, I have opened a branch with this issue name and pushed a function that merges vectors that have similar skills. I haven't added an input switch yet. Please check it out if it is possible for you and let me know what you think.

hosseinfani commented 1 year ago

@rezaBarzgar Thank you. I unit test and works fine. I did some minor changes. see below. Please put the revised code in ./cmn/tools.py

import numpy as np
import scipy.sparse
import copy

def merge_teams_by_skills(teamsvecs, inplace = False):
    vecs = teamsvecs if inplace else copy.deepcopy(teamsvecs)
    # print(f'len of matrix before operation: {len(vec["member"].rows)}')
    merge_list = {}
    for i in range(len(vecs['skill'].rows)):
        merge_list[f'{i}'] = set()
        for j in range(i + 1, len(vecs['skill'].rows)):
            if vecs['skill'].rows[i] == vecs['skill'].rows[j]: merge_list[f'{i}'].add(j)
                # print(colored(f'row {i} and {j} are the same!', 'red'))
        if len(merge_list[f'{i}']) < 1: del merge_list[f'{i}']

    delete_set = set()
    for key in merge_list.keys():
        for item in merge_list[key]: delete_set.add(item)

    for item in delete_set:
        try: del merge_list[f'{item}']
        except KeyError: pass
    # print(merge_list)
    del_list = []
    for key_ in merge_list.keys():
        # print(colored(vecs['member'].getrow(int(key_)).toarray(), 'yellow'))
        for value_ in merge_list[key_]:
            del_list.append(value_)
            vec1 = vecs['member'].getrow(int(key_))
            vec2 = vecs['member'].getrow(value_)
            result = np.add(vec1, vec2)
            result[result != 0] = 1
            vecs['member'][int(key_), :] = scipy.sparse.lil_matrix(result)
        # print(colored(vecs['member'].getrow(int(key_)).toarray(), 'red'))
    vecs['id'] = scipy.sparse.lil_matrix(np.delete(vecs['id'].toarray(), del_list, axis=0))
    vecs['skill'] = scipy.sparse.lil_matrix(np.delete(vecs['skill'].toarray(), del_list, axis=0))
    vecs['member'] = scipy.sparse.lil_matrix(np.delete(vecs['member'].toarray(), del_list, axis=0))
    # print(f'len of matrix after operation: {len(vec["member"].rows)}')
    return vecs

teamsvecs = {}
teamsvecs['id'] = scipy.sparse.lil_matrix([[1],[2],[3],[4],[5]])
teamsvecs['skill'] = scipy.sparse.lil_matrix([[1,1,0],[1,1,0],[0,1,1],[0,1,1],[1,1,1]])
teamsvecs['member'] = scipy.sparse.lil_matrix([[0,1,1,0],[1,1,1,0],[0,1,1,1],[0,1,1,0],[1,1,1,0]])
new_teamsvecs = merge_teams_by_skills(teamsvecs, inplace=True)
print(new_teamsvecs)
# new_vecs['id'] <= [[1], [3], [5]]
# new_vecs['skill'] <= [[1, 1, 0], [0, 1, 1], [1, 1, 1]]
# new_vecs['member'] <= [[1, 1, 1, 0], [0, 1, 1, 1], [1, 1, 1, 0]]

An important note is that we should only apply this on the test split. But in the split dictionary, for the test part, we keep the rowids. After merging on the actual matrix, we need to update the rowids as well which is daunting and have some unknown effect on the pipeline. I'm thinking of repeating the merged rows. For example:

1, [(10), (s1, s2), (e1,e5,e10)] 2, [(20), (s1, s2), (e1,e3,e15)]

==> 1, [(10), (s1, s2), (e1,e5,e10, e3, e15)] 2, [(20), (s1, s2), (e1,e5,e10, e3, e15)]

this way there is no effect I think. Can you revise to do this as an option to the function distinct=False?

hosseinfani commented 1 year ago

@rezaBarzgar also, just a quick reminder that also push for necessary changes per lines. For example, you ide may clean the lines and make necessary changes to other parts of the code. Please do not push them since tracking the changes for a specific bug/feature becomes difficult. Thanks.

rezaBarzgar commented 1 year ago

@hosseinfani Hi Professor. I added the function with "distinct" argument to "cnm.tools.py". now rows will not be deleted if distinct=False. But, I have a problem adding this function to the Test part of models. for example when I want to import the function from cnm.tools, the following error occurs: ModuleNotFoundError: No module named 'cnm'

hosseinfani commented 1 year ago

@rezaBarzgar Thanks. I think cnm should be cmn

rezaBarzgar commented 1 year ago

@hosseinfani Hi, I added the _merge_teams_byskill function to the models' test function. So in this branch (reza-branch) teams can be merged if they have equal skills. also, I couldn't find the right way to change ArgumentParser so I didn't change it. another thing to mention is that I set the default value for merging to False.

hosseinfani commented 1 year ago

@rezaBarzgar thank you. I test and merge it soon.

hosseinfani commented 1 year ago

@rezaBarzgar I did a unit test and there is a minor bug. See the unit test at line https://github.com/fani-lab/OpeNTF/blob/be7a12e8fcf522092aa39d3d1bce652410796f93/src/cmn/tools.py#L68

hosseinfani commented 1 year ago

@rezaBarzgar Now, we need to see the effect on evaluations with merge and lack therof.

rezaBarzgar commented 1 year ago

@hosseinfani Sure. I'll run and report the results.

hosseinfani commented 1 year ago

@rezaBarzgar please first try on the toy datasets of dblp, imdb, uspt

fani-lab / OpeNTF

Evalution on team predictions for test instances #156