Open AndyTheFactory opened 11 months ago
Comment by Cabu Tue Sep 5 09:28:43 2017
I have a similar problem with the NYTimes where the beginning of the article is not loaded. The article is written over 2 DIVs and the system pick the second (bigger) one...
The article: https://www.nytimes.com/2017/09/04/world/asia/muslims-rohingya-daw-aung-san-suu-kyi-malala-myanmar.html
Comment by ckcollab Thu Aug 4 04:16:06 2022
This still seems like a problem:
>>> url = "https://www.cnn.com/2022/08/03/media/alex-jones-sandy-hook-trial/index.html"
>>> a = Article(url)
>>> a.download()
>>> a.parse()
>>> a.text
"New York (CNN Business) <a paragraph or two>...\n\nRead More"
>>>
Issue by pratik151192 Wed Jul 12 21:00:38 2017 Originally opened as https://github.com/codelucas/newspaper/issues/399
On fetching the article content from Article.text; only a few of the initial paragraphs get fetched sometimes. It gets appended with "Read More" at the end. In some cases, even "Read More" doesn't appear.
URL to reproduce error: http://www.cnn.com/2017/01/30/politics/trump-immigration-ban-refugees-trnd/index.html
The demo website link: http://newspaper-demo.herokuapp.com/articles/show?url_to_clean=http%3A%2F%2Fwww.cnn.com%2F2017%2F01%2F30%2Fpolitics%2Ftrump-immigration-ban-refugees-trnd%2Findex.html
The demo website does fetch the content but querying it through my code doesn't