Open monajalal opened 3 years ago
Your code works fine, but something at a precise moment in time caused the 'read timed out' to occur. newspaper3k support timeout in the Config(), which could help prevent future 'read timed out' issues.
from newspaper import Article
from newspaper import Config
user_agent = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()
config.browser_user_agent = user_agent
config.request_timeout = 10
url = "https://www.chicagotribune.com/nation-world/ct-florida-school-shooter-nikolas-cruz-20180217-story.html"
page = Article(url, config=config)
page.download()
page.parse()
print(page.text)
I can see the http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html when browsing in Firefox. However, newspaper3k gives me this error:
Article
download()failed with HTTPSConnectionPool(host='www.chicagotribune.com', port=443): Read timed out. (read timeout=7) on URL http://www.chicagotribune.com/ct-florida-school-shooter-nikolas-cruz-20180217-story.html
My code is: