Open Rounique opened 2 years ago
Hi @Rounique there are two baselines, fnn and bnn. Each one has three prediction lists f{0,1,2}.test.pred on the test set. You need to consider all of them. Also, read my explanation here: https://github.com/fani-lab/fair_team_formation/issues/19#issuecomment-1126624439
After I got access to the computer in the lab I ran the whole dataset on it but the next day I went the computer was turned off! I gave it another run and will check it tomorrow.
Hi @Rounique Have you tested on toy set and pushed your codes to the repo?
Hi @hosseinfani, Are these plots a correct demonstration of the visualization that you had in mind during our meeting last week ( regarding the trade-off between fairness and success) ? Kindly note that these 3 plots can be combined but since for the dblp toy dataset points will overlap, I separated them.
@Hamedloghmani yes, but I think the x and y labels are not correct. Also, make the figure smaller so that we can see them in one shot. also, what is the attribute here, popularity? if so, how did you define the popularity? it's good to put the running settings of these plots.
@hosseinfani Thanks a lot for your comments. I will make smaller plots from now on. I reviewed our meeting notes. I got a few doubts. First, when you mentioned the X and Y labels are wrong, you meant they need to be switched or do you not like the metrics? In that case, I can plot the success metric on the Y axis instead of X if it is more desirable. Regarding the popularity, I used the settings and thresholds that were already available in the source code. The threshold is dynamic, though. Finally, I figured out the library already accepts an argument to limit the number of reranked samples to return. So we do not need to hard-code it.
@Hamedloghmani the x and y should be switched. The current figures says that after reranking the accuracy increases while the fairness decreases! how is this possible?
what are the settings? mention it here.
Not sure I understood your last sentence.
@hosseinfani I am not sure if I understood what you meant by 'settings', but this is what I understood, please correct me if I'm wrong:
baseline = '../output/toy.dblp.v12.json/fnn/t31.s11.m13.l[100].lr0.1.b4096.e20/' vector_data = '../processed/dblp-toy/teamsvecs.pkl' popularity_threshold = 0.5 distribution_list = {'NP': 0.5, 'P': 0.5}
I also made the changes that you requested.
@Hamedloghmani I edit your comment. yes, I meant those. I remove those that are clear from figure.
popularity_threshold = 0.5 ==> what does this mean?
@hosseinfani Thank you. A threshold to define whether a sample is popular or not. If I understood correctly, it is similar as the logic that is used in line 21 to 23 of main.py.
@Hamedloghmani make the base point in x and y to zero
@hosseinfani In the recent commit, I implemented a function to compute basic stats on a given dataset and label the popular and non popular samples based on a dynamic threshold.
Hi Dr. @hosseinfani, I am trying to make the plot, and I have a question. In the output folder, there are multiple files. Which one(s) should be used to visualize the results?