Closed rohitsroch closed 4 years ago
Hi, thanks for the question! Could you elaborate on what do you mean by
The loss is not decreasing after this point and not converging or stuck to local minima.
Does the loss ever decrease in the first 20 epoch? What is the ROUGE score you have achieved by fine-tuning on PEGASUS and T5?
Any plans to release PEGASUS_base
Sorry, there is currently no plan to release base models due to checkpoints incompatibility.
Does the loss ever decrease in the first 20 epoch? What is the ROUGE score you have achieved by fine-tuning on PEGASUS and T5?
@JingqingZ Apologies for the confusion. Yes, the loss decreases smoothly for the first 15-20 epochs but it doesn't converge. Below is the reference plot during training with learning rate 2e-4.
If you check after 5k steps (15 epochs), loss value slightly changes or almost constant (~1.5). Also, I tried training for a further 5 epochs with an increased learning rate to 2e-3, but it diverged and became stable to same value (~1.5). Any thoughts on what should I do?
Also tried training the model using a triangular Cyclical learning rate policy but the same behavior occurs.
Although, results/summary of the evaluation set is good. Below are the ROUGE score for PEGASUS (large) and T5 (small) using following decoding params for both
beam_size = 1
top_p = 0.95
top_k = 50
temperature=0.5
NOTE: Below scores are Average across 78 datapoints in eval set
PEGASUSlarge
ROUGE-1 | ROUGE-2 | ROUGE-L | |
---|---|---|---|
precision | 0.493 | 0.237 | 0.368 |
recall | 0.532 | 0.263 | 0.403 |
fmeasure | 0.486 | 0.237 | 0.365 |
T5small
ROUGE-1 | ROUGE-2 | ROUGE-L | |
---|---|---|---|
precision | 0.507 | 0.211 | 0.363 |
recall | 0.443 | 0.189 | 0.322 |
fmeasure | 0.455 | 0.192 | 0.329 |
Hi, thanks for the information!
I think the overall performance (given the learning curve and ROUGE scores) of PEGASUS looks reasonable so I don't think there is anything wrong in there. But apparently it can be improved by tuning some hyper-parameters, which need some empirical experiments.
the loss decreases smoothly for the first 15-20 epochs but it doesn't converge. Below is the reference plot during training with learning rate 2e-4.
It seems the loss is still decreasing and the fine-tuning may need more steps. In our paper Appendix C, we provide a full table of hyper-parameters we used to fine-tune each dataset and most of them have more fine-tuning steps (and possibly larger batch size) than yours. The learning rate can be smaller as well if the fluctuation of loss persists.
Considering the relatively small eval set with 78 examples, some slight fluctuation of loss on the eval set is possible.
I didn't use Beam search algorithm for decoding
Beam search actually can improve the ROUGE quite significantly for a couple of points.
Hope this may answer your questions!
@JingqingZ, Thanks a lot for quick help, I will check Appendix C in paper :). Closing this issue!
Hi, First of all Great paper! Lately, I have been doing abstractive summarization tasks separately for an agent and a customer, given a conversation transcript between both. We have total labeled datapoints of around 700-1000. (Conversation transcripts)
Currently, I am fine-tuning the released C4 + HugeNews checkpoint to perform abstractive summarization for speaker-1 (Agent) Following is the input/output format to the encoder/decoder
I started the fine-tuning for 20 epochs with a learning rate of 2e-4 The loss is not decreasing after this point and not converging or stuck to local minima.
Any thoughts on how should I approach this problem. Any plans to release PEGASUSbase
Also, as of now, we have fine-tuned the T5 model continuing the summarization task with the "summarization" prefix. it's able to converge much faster in just 5 epochs.