HHousen / DocSum

A tool to automatically summarize documents abstractively using the BART or PreSumm Machine Learning Model.
https://haydenhousen.com/projects/docsum/
GNU General Public License v3.0
66 stars 13 forks source link

Link to pressum is broken #5

Closed kruthikakr closed 3 years ago

kruthikakr commented 3 years ago

Hi i am trying to run the pressum from cmd_summarizer.py but getting this error. i think the path to huggingface is broken. Can you please guide me to run presumm.

Error : Traceback (most recent call last): File "cmd_summarizer.py", line 51, in summarizer = presumm.PreSummSummarizer() File "/content/docsum/presumm/presumm.py", line 28, in init model = BertAbs.from_pretrained("bertabs-finetuned-cnndm") File "/usr/local/lib/python3.6/dist-packages/transformers/modeling_utils.py", line 877, in from_pretrained kwargs, File "/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py", line 347, in from_pretrained config_dict, kwargs = cls.get_config_dict(pretrained_model_name_or_path, kwargs) File "/usr/local/lib/python3.6/dist-packages/transformers/configuration_utils.py", line 400, in get_config_dict raise EnvironmentError(msg) OSError: Can't load config for 'bertabs-finetuned-cnndm'. Make sure that:

HHousen commented 3 years ago

@kruthikakr This same issue was opened at https://github.com/huggingface/transformers/issues/5231, but no one ever responded. For some reason the run_summarization.py script in huggingface/transformers/examples/seq2seq/bertabs uses the remi/bertabs-finetuned-extractive-abstractive-summarization model not the remi/bertabs-finetuned-cnndm-extractive-abstractive-summarization model (notice the cnndm). I use the model with cnndm since I belive this is the correct one. In huggingface/transformers/examples/seq2seq/bertabs the cnndm model is used everywhere except for the aforementioned line.

This issue should now be solved. Let me know if you have any problems!

kruthikakr commented 3 years ago

@HHousen Thank you . . I am still getting some error. when i run the Docsum.ipynb by replacing the model to presumm ie !python cmd_summarizer.py -m presumm --text " some text"

Error : Traceback (most recent call last): File "cmd_summarizer.py", line 54, in do_summarize(args.text) File "cmd_summarizer.py", line 22, in do_summarize transcript_summarized = summarizer.summarize_string(document, min_length=min_length, max_length=max_length) File "/docsum/presumm/presumm.py", line 158, in summarize_string translations = predictor.translate(batch, -1) File "/docsum/presumm/modeling_bertabs.py", line 808, in translate batch_data = self.translate_batch(batch) File "/docsum/presumm/modeling_bertabs.py", line 823, in translate_batch return self._fast_translate_batch(batch, self.max_length, min_length=self.min_length) File "/docsum/presumm/modeling_bertabs.py", line 919, in _fast_translate_batch alive_seq = torch.cat([alive_seq.index_select(0, select_indices), topk_ids.view(-1, 1)], -1) RuntimeError: expected scalar type Long but found Float

HHousen commented 3 years ago

@kruthikakr I've narrowed down the problem to an issue with the update from PyTorch v1.6.0 to v1.7.0:

RuntimeError: Integer division of tensors using div or / is no longer supported, and in a future release div will perform true division as in Python 3. Use true_divide or floor_divide (// in Python) instead.

I've changed .div() to .floor_divide() in commit f3e3e6b. I also locked huggingface/transformers to 3.0.2, so make sure to use that version. This should be fixed now.