HITSZ-HLT / JointCL

20 stars 5 forks source link

batch_size difference #2

Closed chenyez closed 1 year ago

chenyez commented 2 years ago

Hi, thank you for sharing the code! It's really helpful!

I am trying to reproduce the performance on the VAST dataset, I am using python 3.6, Pytorch 1.10.1, cuda 11.3. I tried to use the same hyper-parameters that are reported in the paper, I notice that when I use batch_size=16, i got the following error:

Screen Shot 2022-06-12 at 11 02 08 AM

I googled this issue and someone says this might be because the training is requiring too much memory. So i reduce the batch_size to be 8, and then i can successfully run the training. However, the best performance that I can get on the test set is only 66:

Screen Shot 2022-06-12 at 11 05 59 AM

Here are the hyper-parameters that I am using: --batch_size 8 --stance_loss_weight 1.25 --prototype_loss_weight 0.125 --lr 3e-5 --weight_decay 1e-5 (i added this to the self.optimizer in line42 because I saw this in the paper but not in the code) --seed 42

So my questions are: 1) have you had the memory error that I have? If so, could you please give me some suggestions? 2) in your experiments, would the batch_size (8 vs 16) impact the performance quite a lot? Is it normal that I get 66 of f1-macro when using 8? 3) is there other suggestions that can help me reproduce your results?

Thanks again and looking forward to your reply!

BinLiang-NLP commented 2 years ago

Hi, thank you for sharing the code! It's really helpful!

I am trying to reproduce the performance on the VAST dataset, I am using python 3.6, Pytorch 1.10.1, cuda 11.3. I tried to use the same hyper-parameters that are reported in the paper, I notice that when I use batch_size=16, i got the following error:

Screen Shot 2022-06-12 at 11 02 08 AM

I googled this issue and someone says this might be because the training is requiring too much memory. So i reduce the batch_size to be 8, and then i can successfully run the training. However, the best performance that I can get on the test set is only 66:

Screen Shot 2022-06-12 at 11 05 59 AM

Here are the hyper-parameters that I am using: --batch_size 8 --stance_loss_weight 1.25 --prototype_loss_weight 0.125 --lr 3e-5 --weight_decay 1e-5 (i added this to the self.optimizer in line42 because I saw this in the paper but not in the code) --seed 42

So my questions are:

  1. have you had the memory error that I have? If so, could you please give me some suggestions?
  2. in your experiments, would the batch_size (8 vs 16) impact the performance quite a lot? Is it normal that I get 66 of f1-macro when using 8?
  3. is there other suggestions that can help me reproduce your results?

Thanks again and looking forward to your reply!

Hi, thanks for your question. Maybe your machine is out of memory when setting the batch size to 16? Larger batch size will get better performance in our method. Since the contrastive loss relies on the batch size. Therefore, setting batch_size to 8 may not get good performance. I suggest you change the machine and set the batch size to 16 or 32 to run the code. Please let me know if there is any problem.