AndyTheFactory / newspaper4k

📰 Newspaper4k a fork of the beloved Newspaper3k. Extraction of articles, titles, and metadata from news websites.
MIT License
465 stars 45 forks source link

download fails #166

Open AndyTheFactory opened 12 months ago

AndyTheFactory commented 12 months ago

Issue by sirks Mon Jan 8 13:09:52 2018 Originally opened as https://github.com/codelucas/newspaper/issues/501


please introduce some retry policy for download method smth like

        MAX_RECURSIONS = 5

        if input_html is None:
            try:
                time.sleep(recursion_counter**2)
                html = network.get_html_2XX_only(self.url, self.config)
            except requests.exceptions.RequestException as e:
                self.download_state = ArticleDownloadState.FAILED_RESPONSE
                self.download_exception_msg = str(e)
                print('Download failed on URL %s because of %s' %
                          (self.url, self.download_exception_msg))
                if recursion_counter >= MAX_RECURSIONS:
                    return
                return self.download(recursion_counter = recursion_counter + 1)
AndyTheFactory commented 12 months ago

Comment by jessecooper Tue Jan 23 14:03:57 2018


The retry logic would be best placed in the application logic and not the package object method in my opinion.