edlongman / thescoop

The fastest way to catch up on recent news
thescoop.io
Apache License 2.0
6 stars 0 forks source link

More summarizer bugs #31

Closed jake-patt closed 11 years ago

jake-patt commented 11 years ago

Example of broken summary - http://www.theguardian.com/business/2013/jul/29/eurozone-crisis-germany-greece-austerity-election. Found by 'business' and '2 weeks'.

First letter of article isn't always capitalised either

Statistics break as well: there's a space after the dot, e.g. 2.2% would be 2. 2%

Taiiwo commented 11 years ago

The '.' issue is a problem with OTS. OTS removes the spaces between sentences, so I changed ots.php to add them after '.'s. We could either have no space separated sentences, or this issue. What do you think? EDIT: Just had an idea: I do a str replace to remove all

tags. I could replace them with space instead.

jake-patt commented 11 years ago

Could you not make exceptions for the rule? So if the dot is within a link, don't add a space, and also if it's surrounded by numbers. Those seem to be the only instances where it fails

Taiiwo commented 11 years ago

Just fixed this and pushed it to master. Please confirm and close.