Open VaghehDashti opened 1 year ago
@VaghehDashti doesn't make sense at all!! Or perfectly make sense. I have to talk to a statistician :)
Just in case, I believe you have bnn model saved for #bs={1,3,5,10} for imdb. Can you load them and run them on the same test set to draw these figures like toy? I know the x will be many experts.
@hosseinfani Sure. However I cannot make it for #bs=1 because the min-max-avg are the same thing. #bs = 3:
#bs = 5:
#bs = 10:
#bs = 20:
Please explain the reason behind the sudden drop in loss after the epoch here by mentioning the codline.
@hosseinfani, I believe the drop in loss is due to this line where we decrease the learning rate when the validation loss does not change significantly after 10 epochs. https://github.com/fani-lab/OpeNTF/blob/148c1c2defe1176563f162ad159b2ffe0af15ecc/src/mdl/bnn.py#L111
can you try running on patience=2 but same 20 epochs on imdb or any dataset which gives you results faster?
@hosseinfani,
Here is the train/val loss for patience=2 on imdb with #bs=5:
with patience=10 on imdb with #bs=5:
@hosseinfani, Here are the results of bnn_emb on dblp, imdb, and uspt: imdb:
dblp:
uspt:
We can see that for imdb increasing #bs leads to lower performance and the best performance is with #bs=1. but for dblp and uspt #bs=3 has the best performance but the difference is not significant in my opinion. the model is running on #bs=10 for dblp and uspt. I will update when they are ready. My guess is that model will have lower performance with #bs=10 on dblp and uspt. Right now that the model is acting differently on imdb and dblp/uspt, how should I proceed to the next hyperparameter (size of embeddings)? Should I use #bs=1 for all datasets since the performance of the model with #bs=3 is not significantly different on dblp/uspt?
@VaghehDashti Yes, go with #bs=1
Hello @hosseinfani, I created this issue to put the updates for the hyperparameter study of temporal team formation. I started the run with 2 layers of [64,128] on all models [bnn, bnn_emb, tbnn, tbnn_emb, tbnn_dt2v_emb] and on the three datasets (15 runs in total). I will run the models with 3 layers of [64,128,256] afterwards.