Graph Neural Networks on USPT

karan96 commented 1 year ago

Opening this issue to track our usage of GNNs on USPT dataset. Things to do: - Update this issue page with the details of the previous experiment running GNN embeddings using BNN.

Greetings Dr. @hosseinfani, I was trying to create a data frame that will be fed to a GNN and from which embeddings including locations, experts and skills as experts' attributes will be made. The resultant data frame should look like this: -

where each entry for skill is the corresponding experts' sparse matrix which then will be converted a Tensor which in turn will be fed into GNNs. The way Pytorch-Geometric is doing is using get_dummies function for dataframe which converts categorical variable into dummy/indicator variables as per below: -

But this can't be used in our case because skills are very large in number. In our case each skill is represented in a sparse matrix but if, for example, the skills for a particular expert E are S1|S2|S3 and where sparse matrix for these skills are S1 = [1, 0, 0], S2 = [0, 1, 0], S3 = [0, 0, 1]. The resultant dataframe should only have one entry for skill for expert something like this: - [1, 0 ,0 ,0 ,1, 0, 1, 0]. Please advise on how to create such a dataframe based upon our data.

karan96 commented 1 year ago

@hosseinfani For some reason I cannot edit the issue and assign to myself or edit the project or label for this issue. Could you please do that for me?

karan96 commented 1 year ago

@hosseinfani Hello Dr. Fani, This is the issue page to track our implementation of Graph Neural Network on USPT. Please feel free to elaborate or comment on what we discussed in our meeting. Current Tasks: - To use Graph Embeddings in our Neural Networks.

hosseinfani commented 1 year ago

@karan96 Thanks. Based on our discussion, you want to generate the embeddings of skills, locations, and members (optional). Then use it as an input to the current fnn and bnn baselines.

Just a quick reminder that Radin's latest work is also similar idea and you should choose it as baseline: https://github.com/radinhamidi/Retrieving-Skill-Based-Teams-from-Collaboration-Networks https://github.com/radinhamidi/Forming-Coherent-Teams-in-Collaboration-Networks

karan96 commented 1 year ago

@hosseinfani Greetings Dr. Fani, I was able to create the matrix as we discussed and did the training on toy dataset for bnn_emb. The first image represents the size of resultant skills(111) + locations(3) matrix and the second is the train test plot. I will continue to run the model on the entire dataset.

hosseinfani commented 1 year ago

@karan96 I cannot see the image!

karan96 commented 1 year ago

fae6d99e-e751-4909-be34-45dd4c6d5bac b626d09d-096d-4d31-b71b-371d7e626c69 @hosseinfani

karan96 commented 1 year ago

@hosseinfani I encountered an error while running the model on the entire dataset. Issue raised with pyg: - Issue With Link Prediction. Require your presence in the lab to go through the error faced. Kindly let me know whenever you will be next available in the lab and I will make sure to be present at that time.

hosseinfani commented 1 year ago

@karan96 I'll be in lab tonight at ~7:30-8pm

karan96 commented 1 year ago

@hosseinfani The execution on USPT for bnn_emb is going on. I will update the issue page with the results. It has now reached the evaluation step. Meanwhile, here is the image for training and validation loss: - f4 train_valid_loss Any comments about this plot?

hosseinfani commented 1 year ago

@karan96

is the input is the concat of [skill, loc] embeddings?
the valid loss should increase at some point close to large epoch numbers?!

karan96 commented 1 year ago

@hosseinfani 1. Yes the input is the concat of two embeddings.

Will discuss this in person with you.

fani-lab / OpeNTF

Graph Neural Networks on USPT #182