EdGENetworks / attention-networks-for-classification

Hierarchical Attention Networks for Document Classification in PyTorch
606 stars 133 forks source link

Having 2 optimizers #14

Open JoaoLages opened 6 years ago

JoaoLages commented 6 years ago

Hi there! Thank you for making this implementation open-source! I have one question though: Although you have one backward step, you have 2 optimizers. shouldn't you combine both model's parameters and use only one optimizer?

Sandeep42 commented 6 years ago

In hindsight, I would have used a single optimiser using something like this.

optim.Adam(list(model1.parameters()) + list(model2.parameters())

At that time, I was new to PyTorch and didn't know this. You can go ahead and use 1 optimiser for a much cleaner code.

JoaoLages commented 6 years ago

Thanks for your reply. That is what I am doing. Nevertheless, it seems that while using 2 optimizers the loss lowers way faster than comparing with one optimizer. what might be the reason for this?

Moreover, I have changed the optimizer to Adam but havent been able to get a BCE loss lower than ~0.255 for a multi-label classification problem. Any suggestions?

JoaoLages commented 6 years ago

Nevermind, I had a typo, 2 optimizers vs 1 optimizers produces more or less the same it seems. Still having the loss problem though