IndigoResearch / textteaser

Official version of TextTeaser.
MIT License
621 stars 143 forks source link

Is there a bias of presenting sentences from the end of the article? #5

Closed ijkilchenko closed 8 years ago

ijkilchenko commented 8 years ago

If I use the Chrome app, the sentences I get (when the number of sentences slider is at its default) seem to all come from the end of the article.

Here are two examples:

url: http://www.lrb.co.uk/v38/n08/john-lanchester/when-bitcoin-grows-up summary:

It’s time for the cryptocurrency to decide what it wants to be when it grows up. Blockchains could become merely a new technique to ensure the continuation of banking hegemony in its current form. That would be one of those final plot twists which leaves everybody thinking that although they enjoyed most of the show, the ending was so disappointing they now wish they hadn’t bothered. Or, along with peer-to-peer lending and mobile payments, they could have an impact as great as the new kind of banking introduced in Renaissance Italy. That would be more fun.

url: https://en.wikipedia.org/wiki/Automatic_summarization#Current_challenges_in_evaluating_summaries_automatically summary:

Furthermore, for some methods, not only do we need to have human-made summaries available for comparison, but also manual annotation has to be performed in some of them (e.g. SCU in the Pyramid Method). In any case, what the evaluation methods need as an input, is a set of summaries to serve as gold standards and a set of automatic summaries. Moreover, they all perform a quantitative evaluation with regard to different similarity metrics. To overcome these problems, we think that the quantitative evaluation might not be the only way to evaluate summaries, and a qualitative automatic evaluation would be also important.

In both of the situations, the summaries seem to be generated from sentences at the end. Do you think this has something to do with the Chrome Extension and not your code base or the other way around?

MojoJolo commented 8 years ago

Yes, there's a bias in the "conclusion" part of the article. You can see the bias here in this line: https://github.com/DataTeaser/textteaser/blob/master/textteaser/parser.py#L26

The reference and the reason why can also be seen in the code comment.

ijkilchenko commented 8 years ago

So the position of the sentences in the article is picked according to the hard-coded distribution in that function?

schmamps commented 6 years ago

The distributions are hard-coded into the Parser class. Those values come from a research paper cited in the comments.