I can see the article but cannot download it via newspaper3k

codelucas / newspaper

newspaper3k is a news, full-text, and article metadata extraction in Python 3. Advanced docs:

MIT License

13.9k stars 2.11k forks source link

I can see the http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html when browsing in Firefox. However, newspaper3k gives me this error:

Articledownload()failed with HTTPSConnectionPool(host='www.chicagotribune.com', port=443): Read timed out. (read timeout=7) on URL http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html

My code is:

from newspaper import Article
from newspaper import Config

user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()

config.browser_user_agent = user_agent

url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"

page = Article(url, config=config)

page.download()
page.parse()
print(page.text)

from newspaper import Article from newspaper import Config user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0' config = Config() config.browser_user_agent = user_agent config.request_timeout = 10 url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html" page = Article(url, config=config) page.download() page.parse() print(page.text)

codelucas / newspaper

I can see the article but cannot download it via newspaper3k #829