Open dividor opened 5 years ago
Love this package, amazingly useful.
I am seeing a few sites that don't parse the full text, for example ...
from newspaper import Article url = "https://www.nytimes.com/2019/03/06/technology/personaltech/key-duplicating-machine.html" article = Article(url) article.download() article.parse()
No pay-wall or anything like that as far as I can tell, the text just stops at "In the event of a crime, the police could check whether a key was duplicated with KeyMe and track down who had copied it.", which is halfway through the article.
Another NYT example:
https://www.nytimes.com/2019/01/23/travel/pittsburgh-horror-filmmaker-george-romero.html
In both cases the next truncated paragraph is started with a quote, not sure if that is important or not.
I am on 0.2.8 of newspaper3k (and lxml==4.3.0).
Am I doing something wrong perhaps?
Longstanding issue. See also https://github.com/codelucas/newspaper/issues/645
Love this package, amazingly useful.
I am seeing a few sites that don't parse the full text, for example ...
No pay-wall or anything like that as far as I can tell, the text just stops at "In the event of a crime, the police could check whether a key was duplicated with KeyMe and track down who had copied it.", which is halfway through the article.
Another NYT example:
https://www.nytimes.com/2019/01/23/travel/pittsburgh-horror-filmmaker-george-romero.html
In both cases the next truncated paragraph is started with a quote, not sure if that is important or not.
I am on 0.2.8 of newspaper3k (and lxml==4.3.0).
Am I doing something wrong perhaps?