Closed tanmayag78 closed 4 years ago
@foodiehack Can you please copy-paste the error message and tell me which model specifically (the link) that you are trying to load. Please note that the arXiv-PubMed links are currently empty since that model has not been trained yet.
I am trying to run the inference using GUI as given in the doc for extractive summarisation , the model file used is https://drive.google.com/uc?id=1-W9VzvVgKyu4d3IfNMw0k2zvXzkqpRw7 , but getting this error
Traceback (most recent call last):
File "/gdrive/Kruthika1/longsum/transformersum/src/test.py", line 2, in
Next : I am also trying to run Abstractive summarisation as given in the doc . the model used is file https://drive.google.com/drive/folders/1DBxRZkOHS7OdU80L8OvnzCa3K6-ho_Dj ..and as said from @foodiehack there is no checkpint , i can see python.bin . Please give suggest the model to run for Abstractive summarisation.
@kruthikakr For the extractive error, what version of transformers
are you using. Try using v3.0.2 with pip install -U transformers==3.0.2
as discussed at https://github.com/HHousen/TransformerSum/issues/20#issuecomment-703825468.
The abstractive summarization folder that you linked contains three different models. The number at the end is the length of the input sequence that each model can accept.
@HHousen Ya the link has 3 folders , in all the 3 folders there is no model checkpoint , i can see the .bin files. To run the longformer abstractive summarisation which model can be used .
@HHousen we have been evaluating different Abstractive summarisation models , since the SOTA model latest is Pegasus , have you looked in to it ? with our experimentation for general text T5 and Bart are giving better results than Pegasus. Please give me your comments on this .
Adding to this , i see distilled models for Extractive , how to make them for abstractive or any reference if they are already there ?
@HHousen Ya the link has 3 folders , in all the 3 folders there is no model checkpoint , i can see the .bin files. To run the longformer abstractive summarisation which model can be used .
@kruthikakr These are huggingface/transformers
models, so they need to be used with the --model_name_or_path
option for further training. Or you can load them directly in transformers
using LongformerEncoderDecoderForConditionalGeneration.from_pretrained()
.
@HHousen we have been evaluating different Abstractive summarisation models , since the SOTA model latest is Pegasus , have you looked in to it ? with our experimentation for general text T5 and Bart are giving better results than Pegasus. Please give me your comments on this .
Adding to this , i see distilled models for Extractive , how to make them for abstractive or any reference if they are already there ?
You need to use a seq2seq architecture for abstractive summarization. I would recommend distilbart, specifically sshleifer/distilbart-cnn-12-6. The performance of each model depends on the dataset you evaluate on and the dataset the model was trained on. For instance, a model trained to summarize news will not summarize a short story well. Other than that, I'm not sure why BART and T5 outperform PEGASUS.
Thank you very much for the reply . I am referencing to https://transformersum.readthedocs.io/en/latest/abstractive/models-results.html#bart-converted-to-longformerencdec where the models have no checkpoint files in your library. As i don't want to put for training. I am just trying similar to extractive summarisation models using GUI (python predictions_website.py) , How to load the models for abstractive summarisation.
Sorry ,i am new to transformers , can you please provide details for LongformerEncoderDecoderForConditionalGeneration.from_pretrained(). I will verify. Thank you
There are currently no pre-trained models that can be used to abstractively summarize long documents. Models listed in the BART Converted to LongformerEncoderDecoder section need to be fine-tuned on a long document summarization dataset, such as Arxiv-PubMed, to create a model that can summarize long sequences. The ArXiv-PubMed models will be trained as soon as I obtain the resources necessary to train them (2 Tesla V100 GPUs).
I've updated the documentation to reflect this.
Further discussion of this issue will be moved to #38.
I am trying to load Pre Trained Abstractive model as written in the docs but it's giving an error. The model uploaded is not a checkpoint