Data Integration - Githubissues

VaghehDashti commented 1 year ago

Hello @hosseinfani, I created this issue as you asked to put updates for uploading the models and results of our temporal paper to ms-teams. dblp is almost done. only bnn_emb is left. imdb bnn, bnn_emb, and tbnn is done. tbnn_emb and tbnn_dt2v_emb is left. uspt bnn and bnn_emb is done. tbnn, tbnn_emb, and tbnn_dt2v is left.

VaghehDashti commented 1 year ago

Hi @hosseinfani, imdb and upst's results are uploaded as well. I will move forward to rrn's results.

VaghehDashti commented 1 year ago

Hi @hosseinfani, rrn's results plus the reformulated input data and the predictions of the model are uploaded to ms-teams as well.

hosseinfani commented 1 year ago

@VaghehDashti Thank you. All is done then, right?

VaghehDashti commented 1 year ago

@hosseinfani Yes, it is done.

hosseinfani commented 1 year ago

@VaghehDashti I've the files. I think the preprocessed folder is missing. We need the pkl files in preprocessed folder on which these results have been obtained.

VaghehDashti commented 1 year ago

Hi @hosseinfani, Oh, I didn't know about this. I will upload them and let you know when done.

VaghehDashti commented 1 year ago

Dblp's preprocessed data is uploaded. imdb and uspt are being downloaded.

VaghehDashti commented 1 year ago

Hi @hosseinfani, All is done.

hosseinfani commented 1 year ago

@VaghehDashti Thank you. I download the files from msteams in my local drive also. Just to make sure the upload happened correctly, the full size is 70.5 GB, is that the right size in your local drive/sharcnet drive?

VaghehDashti commented 1 year ago

Hi @hosseinfani, I don't know the exact size because after uploading them I would delete them from the local machine. However, I should let you know that my earlier guess about the total uncompressed size was wrong because there were a lot of old experiments or experiments with different filterings that I deleted before downloading from sharcnet and uploading to ms-teams.

VaghehDashti commented 1 year ago

Hi @hosseinfani, I've uploaded the results of the experiments on dblp with [128,64,128] and [256,128,64,128,256] to ms-teams. The performance of bnn and bnn_emb has decreased on dblp and imdb. I think the decrease in performance is due to the size of the models getting larger while the #training instance has not increased, so the model is probably underfitting. Although, changing the model size may require a new learning rate, batch size, #epochs, and so on for each dataset to get better or same performance. uspt with bnn is still running for [256,..,256]. I will let you know when the complete results on upst are ready and uploaded to ms-teams. Although it shows the same trend on the models that are already trained. Please let me know what you think.

hosseinfani commented 1 year ago

Hi @VaghehDashti Thanks. I replied in #120

VaghehDashti commented 1 year ago

Hi @hosseinfani, The results of uspt is also uploaded to ms-teams for 3 and 5 layers and both models.

VaghehDashti commented 1 year ago

Hi @hosseinfani, I've pushed the results of the the ecir paper to github. Please pull.

VaghehDashti commented 1 year ago

Hi @hosseinfani, The results for imdb with 3 layers [128,64,128] with learning rates of 0.01 and 0.001 are uploaded to ms-teams.

VaghehDashti commented 1 year ago

Hi @hosseinfani, I've pushed the temporal stats of the 3 datasets to github. I should let you know that for dblp and uspt, I had to change line 124 of stats.py from: ax.set_yticks(list(range(0, max_of_nteams, 100))) to ax.set_yticks(list(range(0, max_of_nteams, 1000))) so that the ticks on the y-axis don't get messed up. Please let me know if you know of a better way of doing this.

VaghehDashti commented 1 year ago

Hi @hosseinfani, I wanted to let you know that the data for experiments on #layers/nodes with different learning rates are uploaded to ms-teams and the results are pushed to dev. Will update here after finishing the experiments on #bayesian samples.

VaghehDashti commented 1 year ago

@hosseinfani I've uploaded the results for #bs={3,5,10} for all 3 datasets to ms-teams and pushed the results to dev as well.

fani-lab / OpeNTF

Data Integration #178