Closed michaelroyzen closed 4 years ago
Whoops! Yes, the number of top sentences to select was hardcoded to 2 in the ExtractiveSummarizer.predict()
function. Yes, the top n
sentences are returned. I've updated the library so the ExtractiveSummarizer.predict()
function has a num_summary_sentences
argument to specify the number of sentences in the output summary. The default is 3 sentences. Let me know if this works :smile:.
Hi, Is there any upper limit for num_summary_sentences
? Wanted to create a summary of 100 sentences from an article of 200+ sentences using MobileBERT. It gives only 8 sentences (or below if num_summary_sentences
is smaller), regardless of num_summary_sentences
value. Please advise. Thank you.
Yes, there is an upper limit since the decoder of most BART-Like models can only output 512 tokens. Transformers generally cannot handle long sequences of input or output. The Longformer would be your best option since it can handle an input of about 8000/16000 (depending on the version) tokens but still only outputs 512 tokens. If you wanted a summary that long then you should use a standard algorithm like TextRank.
Thank you for the quick response Housen and for the direction.
This is a very interesting library @HHousen, but my extractive summaries are always two sentences regardless of input document length. How can I increase the length? I would assume that the extractive decoder would, by design, return the top n sentences over a certain threshold?