HHousen / TransformerSum

Models to perform neural summarization (extractive and abstractive) using machine learning transformers and a tool to convert abstractive summarization datasets to the extractive task.
https://transformersum.rtfd.io
GNU General Public License v3.0
428 stars 58 forks source link

Cannot load Abstractive Longformer Pre-Trained Model #36

Closed tanmayag78 closed 4 years ago

tanmayag78 commented 4 years ago

I am trying to load Pre Trained Abstractive model as written in the docs but it's giving an error. The model uploaded is not a checkpoint

HHousen commented 4 years ago

@foodiehack Can you please copy-paste the error message and tell me which model specifically (the link) that you are trying to load. Please note that the arXiv-PubMed links are currently empty since that model has not been trained yet.

kruthikakr commented 4 years ago

I am trying to run the inference using GUI as given in the doc for extractive summarisation , the model file used is https://drive.google.com/uc?id=1-W9VzvVgKyu4d3IfNMw0k2zvXzkqpRw7 , but getting this error Traceback (most recent call last): File "/gdrive/Kruthika1/longsum/transformersum/src/test.py", line 2, in model = ExtractiveSummarizer.load_from_checkpoint("/gdrive/Kruthika1/longsum/transformersum/models/epoch=3.ckpt") File "/gdrive/Kruthika1/virtualenvironment/Huggingface/lib/python3.6/site-packages/pytorch_lightning/core/saving.py", line 154, in load_from_checkpoint model = cls._load_model_state(checkpoint, strict=strict, **kwargs) File "/gdrive/Kruthika1/virtualenvironment/Huggingface/lib/python3.6/site-packages/pytorch_lightning/core/saving.py", line 200, in _load_model_state model.load_state_dict(checkpoint['state_dict'], strict=strict) File "/gdrive/Kruthika1/virtualenvironment/Huggingface/lib/python3.6/site-packages/torch/nn/modules/module.py", line 1052, in load_state_dict self.class.name, "\n\t".join(error_msgs))) RuntimeError: Error(s) in loading state_dict for ExtractiveSummarizer: Missing key(s) in state_dict: "word_embedding_model.embeddings.position_ids".

Next : I am also trying to run Abstractive summarisation as given in the doc . the model used is file https://drive.google.com/drive/folders/1DBxRZkOHS7OdU80L8OvnzCa3K6-ho_Dj ..and as said from @foodiehack there is no checkpint , i can see python.bin . Please give suggest the model to run for Abstractive summarisation.

HHousen commented 4 years ago

@kruthikakr For the extractive error, what version of transformers are you using. Try using v3.0.2 with pip install -U transformers==3.0.2 as discussed at https://github.com/HHousen/TransformerSum/issues/20#issuecomment-703825468.

The abstractive summarization folder that you linked contains three different models. The number at the end is the length of the input sequence that each model can accept.

kruthikakr commented 4 years ago

@HHousen Ya the link has 3 folders , in all the 3 folders there is no model checkpoint , i can see the .bin files. To run the longformer abstractive summarisation which model can be used .

kruthikakr commented 4 years ago

@HHousen we have been evaluating different Abstractive summarisation models , since the SOTA model latest is Pegasus , have you looked in to it ? with our experimentation for general text T5 and Bart are giving better results than Pegasus. Please give me your comments on this .

Adding to this , i see distilled models for Extractive , how to make them for abstractive or any reference if they are already there ?

HHousen commented 4 years ago

@HHousen Ya the link has 3 folders , in all the 3 folders there is no model checkpoint , i can see the .bin files. To run the longformer abstractive summarisation which model can be used .

@kruthikakr These are huggingface/transformers models, so they need to be used with the --model_name_or_path option for further training. Or you can load them directly in transformers using LongformerEncoderDecoderForConditionalGeneration.from_pretrained().

@HHousen we have been evaluating different Abstractive summarisation models , since the SOTA model latest is Pegasus , have you looked in to it ? with our experimentation for general text T5 and Bart are giving better results than Pegasus. Please give me your comments on this .

Adding to this , i see distilled models for Extractive , how to make them for abstractive or any reference if they are already there ?

You need to use a seq2seq architecture for abstractive summarization. I would recommend distilbart, specifically sshleifer/distilbart-cnn-12-6. The performance of each model depends on the dataset you evaluate on and the dataset the model was trained on. For instance, a model trained to summarize news will not summarize a short story well. Other than that, I'm not sure why BART and T5 outperform PEGASUS.

kruthikakr commented 4 years ago

Thank you very much for the reply . I am referencing to https://transformersum.readthedocs.io/en/latest/abstractive/models-results.html#bart-converted-to-longformerencdec where the models have no checkpoint files in your library. As i don't want to put for training. I am just trying similar to extractive summarisation models using GUI (python predictions_website.py) , How to load the models for abstractive summarisation.

Sorry ,i am new to transformers , can you please provide details for LongformerEncoderDecoderForConditionalGeneration.from_pretrained(). I will verify. Thank you

HHousen commented 4 years ago

There are currently no pre-trained models that can be used to abstractively summarize long documents. Models listed in the BART Converted to LongformerEncoderDecoder section need to be fine-tuned on a long document summarization dataset, such as Arxiv-PubMed, to create a model that can summarize long sequences. The ArXiv-PubMed models will be trained as soon as I obtain the resources necessary to train them (2 Tesla V100 GPUs).

I've updated the documentation to reflect this.

HHousen commented 4 years ago

Further discussion of this issue will be moved to #38.