google-research / pegasus

Apache License 2.0
1.61k stars 315 forks source link

Finetuning errors #62

Open sxu239 opened 4 years ago

sxu239 commented 4 years ago

I want to finetune on an existing dataset, so I ran the following code, but I ran into some errors, could you please take a look at it?Thanks!

python3 pegasus/bin/train.py --params=aeslc_transformer \ --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model \ --train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 \ --model_dir=ckpt/pegasus_ckpt/aeslc

6eb5787b39edd479faa20218c1c03e2 7bd3f94a66ec748f14839189cce6c5c 5e13d79ece08516af5a27107b756304

sxu239 commented 4 years ago

also, since there is already a trained model in the aeslc folder, what does fine tuning does to help improve the model?

JingqingZ commented 4 years ago

Hi, I notice there is an error of HTTP request at the beginning which may cause following errors.

For each downstream dataset (like aeslc), we fine-tune the model so that the model is adapted to the downstream dataset for summarization. You may refer to our paper for details about pre-training and fine-tuning. The trained model checkpoint in the aeslc folder is the model that has already been fine-tuned on aeslc.

sxu239 commented 4 years ago

Thank you! if that's the case, I can just directly use the model in the aeslc folder without going through the fine tuning process right?

sxu239 commented 4 years ago

also, is it possible for the pre-trained model to summarize multiple articles, and if yes, what would the input look like? Thanks again for the help!

JingqingZ commented 4 years ago

Yes, you can use the checkpoint in aeslc folder directly for inference and no need to fine-tune.

Our model is for single-document summarization. If the input has multiple documents (like multi-news), we simply concatenate the documents as one document and apply the max_input_len (i.e. we only use the first hundreds of tokens in the concatenated document).

sxu239 commented 4 years ago

That makes a lot of sense. Since there is a limit for the input, when you guys trained the model with different datasets, such as pubmed, did you use the whole article to get the summarization, or just say the abstract/intro/result part of it? Thanks!

JingqingZ commented 4 years ago

Since pubmed is also very long, as mentioned in the appendix C of our paper, the input uses first 1024 tokens of main documents, and the target is the abstract (256 tokens cover most abstracts).

ryzhik22 commented 4 years ago

Hello! I try to apply fine-tuned arxiv model (your checkpoint), this is my code:

!python3 pegasus/bin/evaluate.py --params=new_params \
--param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6\
 --model_dir=ckpt/pegasus_ckpt/arxiv

But I have an error: "ValueError: Could not find trained model in model_dir: ckpt/pegasus_ckpt/arxiv" Could you please help me what is wrong? I have "model.ckpt-340000.data-00000-of-00001" in this folder

ryzhik22 commented 4 years ago

I was able to evaluate the arxiv model on my example with

python3 pegasus/bin/evaluate.py --params=new_params \
--param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6\
 --model_dir=ckpt/pegasus_ckpt/arxiv/model.ckpt-340000

And I got the following summary:

in this paper , we investigate the effect of the position dependent temperature profile on the efficiency of a heat pump . 
 we find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile . 
 we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile . 
 we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile . 
 we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile . 
 we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile . 
 we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile . 
 we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile . 
 we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile .

How do you think, what is the reason for such model behavior? The input was the first 1024 tokens from the arxiv article

JingqingZ commented 4 years ago

Regarding checkpoint, this may help https://github.com/google-research/pegasus/issues/3#issuecomment-634034845

For arxiv, we recommended beam_size=8,beam_alpha=0.8. A complete table of these hyperparametres for different datasets can be found in appendix C of our paper.

Hope this may help!

ryzhik22 commented 4 years ago

Thank you! And could you tell about data preprocessing regarding scientific articles, please? Do you use first 1024 tokens of Introduction part? Do you delete formulae?

JingqingZ commented 4 years ago

The loading and preprocessing of scientific papers are implemented by https://www.tensorflow.org/datasets/catalog/scientific_papers. We didn't do any extra processing except using first 1024 tokens (of the entire document, though it should be mostly in the Introduction section).