Freezing embeds and encoder

fabrahman commented 4 years ago

Hi, Thanks for putting this useful repo together! I wonder if and why are you freezing all encoder, embeds and decoder in these lines?

Did you get better results by freezing e.g. encoders or embeds? BTW is the led the same implementation of longformer paper with sliding_chunk attention mode?

alexgaskell10 commented 4 years ago

Hi thanks for your interest! So training occurs in the generic_train line 366; everything from 374-396 is evaluation so producing the output summaries. So with these lines, the model weights should be frozen (the model.eval() call in line 390 should do this- I think it looked like it wasn't working correctly so I added the next three lines manually freezing all weights so I am not sure if they are necessary or not).

So I believe the script is ready to go for fine-tuning. Instructions are in the ReadMe under "Step 1. Finetune the LED". Arguments for running the script are in nlp_summarization/scripts/models/sh_scripts/finetune_led.sh and you should execute "sh sh_scripts/finetune_led.sh" from nlp_summarization/scripts/models.

Let me know if you have any issues here.

fabrahman commented 4 years ago

Hi thanks for your interest! So training occurs in the generic_train line 366; everything from 374-396 is evaluation so producing the output summaries. So with these lines, the model weights should be frozen (the model.eval() call in line 390 should do this- I think it looked like it wasn't working correctly so I added the next three lines manually freezing all weights so I am not sure if they are necessary or not).

So I believe the script is ready to go for fine-tuning. Instructions are in the ReadMe under "Step 1. Finetune the LED". Arguments for running the script are in nlp_summarization/scripts/models/sh_scripts/finetune_led.sh and you should execute "sh sh_scripts/finetune_led.sh" from nlp_summarization/scripts/models.

Let me know if you have any issues here.

Yeah the code works perfectly and I already was able to run it. Since I read some where that you freezed enc and embeds during training to fit it in the memory I wonder if you compared the results when freezing/not-freezing... Also wanted to double check if led is the same as longformer paper.

alexgaskell10 commented 4 years ago

Since I read some where that you freezed enc and embeds during training to fit it in the memory I wonder if you compared the results when freezing/not-freezing...

Yes I did compare and the model performs much better when nothing is frozen.

Yes, basically. The LED is an encoder-decoder version of the Longformer, which was originally created as an encoder only. My version of the LED is taken from the Longformer's author's GitHub here.

fabrahman commented 4 years ago

@alexgaskell10 It is not clear whether longbart is the exact same as led? both of them are listed as choices for model_variant. Could you please confirm that using either of them is the same and is longformer? or is there any differences? According to this, both calls the same get_led method.

alexgaskell10 commented 4 years ago

Yes, they are the same. When I was developing this there were 2 versions of the LED, one named LED and the other named LongBart. Now they have been merged into the LED so yes they are the same and you can ignore this. (I kept the argument in there to make it backward compatible with old saved models I have).

alexgaskell10 / nlp_summarization

Freezing embeds and encoder #1