edlongman / thescoop

The fastest way to catch up on recent news
thescoop.io
Apache License 2.0
6 stars 0 forks source link

Summarizer still tries to process video links and picture galleries #27

Open jake-patt opened 11 years ago

Taiiwo commented 11 years ago

Pretty sure this is fixed now. Can you find some broken links? EDIT: Oh, I see what you mean. That's a bug in the Readablilty API. Not sure how to fix that other than to revert back to the old version, or do a really dirty fix like: Only use the old version when 'video' is in the article title.

jake-patt commented 11 years ago

Environment and 2 weeks is an example. How did the old API handle it? Even returning 'cannot summarize article' would be an improvement

Taiiwo commented 11 years ago

The old API looked at all text inside a div with a specific id/class. The current one takes an URL and looks for the largest body of text within the page. Checking for the id before loading the page would take too long to load, so the only thing I can think of would be to use the old method for all URLs with theguardian.com/video/ in the URL.

jake-patt commented 11 years ago

Good idea, what did the old method return? If it still doesn't make sense then returning an error would be a better option

Taiiwo commented 11 years ago

Pushed quick fix to master. Will now error if it starts talking about autoplay. '/video/' is floating in the middle of the URL, and can't be parsed nicely without affecting future links.

edlongman commented 11 years ago

Can't a regular expression sort that? Ok can you pull since I'm on mobile On 12 Aug 2013 18:56, "Taiiwo" notifications@github.com wrote:

Pushed quick fix to master. Will now error if it starts talking about autoplay. '/video/' is floating in the middle of the URL, and can't be parsed nicely without affecting future links.

— Reply to this email directly or view it on GitHubhttps://github.com/edlongman/thescoop/issues/27#issuecomment-22512145 .