Finetuning Loss not decreasing on Custom Summarization Task [Help wanted]

rohitsroch commented 4 years ago

Hi, First of all Great paper! Lately, I have been doing abstractive summarization tasks separately for an agent and a customer, given a conversation transcript between both. We have total labeled datapoints of around 700-1000. (Conversation transcripts)

Currently, I am fine-tuning the released C4 + HugeNews checkpoint to perform abstractive summarization for speaker-1 (Agent) Following is the input/output format to the encoder/decoder

# only the sentences correspond to speaker-1 (agent). Each sentence separated by a full stop
 Input: This is agent sentence-1.  This is agent sentence-2. This is agent sentence-3.

# corresponding ground truth
 output: This is the agent summary

I started the fine-tuning for 20 epochs with a learning rate of 2e-4 The loss is not decreasing after this point and not converging or stuck to local minima.
Any thoughts on how should I approach this problem. Any plans to release PEGASUS_base
Also, as of now, we have fine-tuned the T5 model continuing the summarization task with the "summarization" prefix. it's able to converge much faster in just 5 epochs.

JingqingZ commented 4 years ago

Hi, thanks for the question! Could you elaborate on what do you mean by

The loss is not decreasing after this point and not converging or stuck to local minima.

Does the loss ever decrease in the first 20 epoch? What is the ROUGE score you have achieved by fine-tuning on PEGASUS and T5?

Any plans to release PEGASUS_base

Sorry, there is currently no plan to release base models due to checkpoints incompatibility.

rohitsroch commented 4 years ago

Does the loss ever decrease in the first 20 epoch? What is the ROUGE score you have achieved by fine-tuning on PEGASUS and T5?

@JingqingZ Apologies for the confusion. Yes, the loss decreases smoothly for the first 15-20 epochs but it doesn't converge. Below is the reference plot during training with learning rate 2e-4.

loss

If you check after 5k steps (15 epochs), loss value slightly changes or almost constant (~1.5). Also, I tried training for a further 5 epochs with an increased learning rate to 2e-3, but it diverged and became stable to same value (~1.5). Any thoughts on what should I do?
Also tried training the model using a triangular Cyclical learning rate policy but the same behavior occurs.
Although, results/summary of the evaluation set is good. Below are the ROUGE score for PEGASUS (large) and T5 (small) using following decoding params for both

  beam_size = 1
  top_p = 0.95
  top_k = 50
  temperature=0.5

NOTE: Below scores are Average across 78 datapoints in eval set

PEGASUS_large

	ROUGE-1	ROUGE-2	ROUGE-L
precision	0.493	0.237	0.368
recall	0.532	0.263	0.403
fmeasure	0.486	0.237	0.365

T5_small

	ROUGE-1	ROUGE-2	ROUGE-L
precision	0.507	0.211	0.363
recall	0.443	0.189	0.322
fmeasure	0.455	0.192	0.329

I didn't use Beam search algorithm for decoding as it taking lot of time to decode (with a beam size of 5) for each input on n1-standard VM 8v CPUs

JingqingZ commented 4 years ago

Hi, thanks for the information!

I think the overall performance (given the learning curve and ROUGE scores) of PEGASUS looks reasonable so I don't think there is anything wrong in there. But apparently it can be improved by tuning some hyper-parameters, which need some empirical experiments.

the loss decreases smoothly for the first 15-20 epochs but it doesn't converge. Below is the reference plot during training with learning rate 2e-4.

It seems the loss is still decreasing and the fine-tuning may need more steps. In our paper Appendix C, we provide a full table of hyper-parameters we used to fine-tune each dataset and most of them have more fine-tuning steps (and possibly larger batch size) than yours. The learning rate can be smaller as well if the fluctuation of loss persists.

Considering the relatively small eval set with 78 examples, some slight fluctuation of loss on the eval set is possible.

I didn't use Beam search algorithm for decoding

Beam search actually can improve the ROUGE quite significantly for a couple of points.

Hope this may answer your questions!

rohitsroch commented 4 years ago

@JingqingZ, Thanks a lot for quick help, I will check Appendix C in paper :). Closing this issue!

google-research / pegasus

Finetuning Loss not decreasing on Custom Summarization Task [Help wanted] #47