dmmiller612 / bert-extractive-summarizer

Easy to use extractive text summarization with BERT
MIT License
1.39k stars 305 forks source link

ratio confusion #29

Closed dzimmerman-nci closed 4 years ago

dzimmerman-nci commented 4 years ago

My understanding is that the ratio parameter is the ratio of sentences to get back from the summarizer from the text to summarize. It does not seem to be doing this consistently. When I set the ratio = 1.0, I should get all of the text back that I sent to summarize, but this does not seem to be the case always. What is the reason for this?

dmmiller612 commented 4 years ago

Yeah, it is probably related to the minimum/maximum sentence length filters. For some of the pretrained bert models, there are biases for really long or short sentences. By default, I think the minimum character length for a sentence is around 25, and maximum is around 500.

dzimmerman-nci commented 4 years ago

Is this also the reason why sometimes it returns an empty summary?

I've seen it return an empty summary when the ratio is set to 1.0

dmmiller612 commented 4 years ago

yeah, that can happen if all of the sentences fall out of the size range. A way to test this is the open up those ranges on the summarization. This can be done with a command such as:

summarizer(body='to summarizer here', min_length=0, max_length=10000)