Open sxu239 opened 4 years ago
also, since there is already a trained model in the aeslc folder, what does fine tuning does to help improve the model?
Hi, I notice there is an error of HTTP request at the beginning which may cause following errors.
For each downstream dataset (like aeslc), we fine-tune the model so that the model is adapted to the downstream dataset for summarization. You may refer to our paper for details about pre-training and fine-tuning. The trained model checkpoint in the aeslc folder is the model that has already been fine-tuned on aeslc.
Thank you! if that's the case, I can just directly use the model in the aeslc folder without going through the fine tuning process right?
also, is it possible for the pre-trained model to summarize multiple articles, and if yes, what would the input look like? Thanks again for the help!
Yes, you can use the checkpoint in aeslc folder directly for inference and no need to fine-tune.
Our model is for single-document summarization. If the input has multiple documents (like multi-news), we simply concatenate the documents as one document and apply the max_input_len (i.e. we only use the first hundreds of tokens in the concatenated document).
That makes a lot of sense. Since there is a limit for the input, when you guys trained the model with different datasets, such as pubmed, did you use the whole article to get the summarization, or just say the abstract/intro/result part of it? Thanks!
Since pubmed is also very long, as mentioned in the appendix C of our paper, the input uses first 1024 tokens of main documents, and the target is the abstract (256 tokens cover most abstracts).
Hello! I try to apply fine-tuned arxiv model (your checkpoint), this is my code:
!python3 pegasus/bin/evaluate.py --params=new_params \
--param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6\
--model_dir=ckpt/pegasus_ckpt/arxiv
But I have an error: "ValueError: Could not find trained model in model_dir: ckpt/pegasus_ckpt/arxiv" Could you please help me what is wrong? I have "model.ckpt-340000.data-00000-of-00001" in this folder
I was able to evaluate the arxiv model on my example with
python3 pegasus/bin/evaluate.py --params=new_params \
--param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model,batch_size=1,beam_size=5,beam_alpha=0.6\
--model_dir=ckpt/pegasus_ckpt/arxiv/model.ckpt-340000
And I got the following summary:
in this paper , we investigate the effect of the position dependent temperature profile on the efficiency of a heat pump .
we find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile .
we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile .
we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile .
we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile .
we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile .
we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile .
we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile .
we also find that the efficiency of a heat pump can be significantly enhanced by the presence of a position dependent temperature profile .
How do you think, what is the reason for such model behavior? The input was the first 1024 tokens from the arxiv article
Regarding checkpoint, this may help https://github.com/google-research/pegasus/issues/3#issuecomment-634034845
For arxiv, we recommended beam_size=8,beam_alpha=0.8
. A complete table of these hyperparametres for different datasets can be found in appendix C of our paper.
Hope this may help!
Thank you! And could you tell about data preprocessing regarding scientific articles, please? Do you use first 1024 tokens of Introduction part? Do you delete formulae?
The loading and preprocessing of scientific papers are implemented by https://www.tensorflow.org/datasets/catalog/scientific_papers. We didn't do any extra processing except using first 1024 tokens (of the entire document, though it should be mostly in the Introduction section).
I want to finetune on an existing dataset, so I ran the following code, but I ran into some errors, could you please take a look at it?Thanks!
python3 pegasus/bin/train.py --params=aeslc_transformer \ --param_overrides=vocab_filename=ckpt/pegasus_ckpt/c4.unigram.newline.10pct.96000.model \ --train_init_checkpoint=ckpt/pegasus_ckpt/model.ckpt-1500000 \ --model_dir=ckpt/pegasus_ckpt/aeslc