Open VaghehDashti opened 1 year ago
@VaghehDashti the choices of layers should be like [128, 64, 128], that is narrowing down and expanding. Please redo for this setting. Also, do this only for bnn and fnn.
@VaghehDashti the choices of layers should be like [128, 64, 128], that is narrowing down and expanding. Please redo for this setting. Also, do this only for bnn and fnn.
@hosseinfani I will do that for 3 layers. So for two layers it should be [128, 64]? Also, could you please explain why fnn? As we discussed earlier, we're starting from the best model from the negative sampling paper, and comparing with new temporal models. I think the chosen models are appropriate for our temporal hyperparameter study. Please let me know what you think.
Hi @hosseinfani, As we discussed over the phone, I have started the hyperparameter study for bnn and bnn_emb with layers [128,64,128]. I will wait to see how long it will take to run with these hyperparameters then will run [256,128,64,128,256]. Then I will move forward with the best model to the next step (either #negative samples or dim of input embedding). Please let me know what you think.
@VaghehDashti Agree.
@VaghehDashti It might be also overfit. Have you seen the result on training set? What is the behaviour on valid set?
Hi @hosseinfani, That could be another possibility. Here are the training/validation loss for the datasets: dblp: this is for bnn with [128,64,128]. the loss for [256...256] looks similar. the other folds are similar as well. this for bnn_emb: same as bnn
imdb: this if for bnn with 3 layers, but both bnn and bnn_emb for 3 and 5 layers look similar.
uspt: this if for bnn with 3 layers, but bnn_emb for 3 and 5 layers look similar as well. bnn with 5 layers is not done yet, but it will probably look the same.
Please let me know what you think.
I should also mention that the range of loss is higher for all three datasets on both models compared to one layer.
@VaghehDashti does not make sense. increasing epoch make it worse??
@hosseinfani, yes, that is strange, especially for the training set. The only thing that after thinking and a bit of searching came up was that the learning rate may be bigger than what it should be (I used the same learning rate as with 1 layer).
Hi @hosseinfani, I ran bnn and bnn_emb with 3 layers [128,64,128] on imdb with learning rate of 0.01 and 0.001. Now training and validation loss for both models decrease after each epoch. However, the performance of both models has decreased significantly. bnn with 0.01:
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:x="urn:schemas-microsoft-com:office:excel" xmlns="http://www.w3.org/TR/REC-html40">
| mean -- | -- P_2 | 0 P_5 | 0 P_10 | 0 recall_2 | 0 recall_5 | 0 recall_10 | 0 ndcg_cut_2 | 0 ndcg_cut_5 | 0 ndcg_cut_10 | 0 map_cut_2 | 0 map_cut_5 | 0 map_cut_10 | 0 aucroc | 0.488561
Hello @hosseinfani, I created this issue to put the updates for the hyperparameter study of temporal team formation. I started the run with 2 layers of [64,128] on all models [bnn, bnn_emb, tbnn, tbnn_emb, tbnn_dt2v_emb] and on the three datasets (15 runs in total). I will run the models with 3 layers of [64,128,256] afterwards.