Closed Aliktk closed 2 years ago
The function _scraplinks() is incorrect. Build isn't used that way. Reference my overview document on how to use build.
Thank you @johnbumgarner for your reply solved as stated:
I Used below chunk of code and it return the links of all articles.
def scrap_links(link):
USER_AGENT = 'Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:78.0) Gecko/20100101 Firefox/78.0'
config = Config()
config.browser_user_agent = USER_AGENT
config.request_timeout = 10
article_urls = set()
marketwatch = newspaper.build(
link, config=config, memoize_articles=False, language='en')
for sub_article in marketwatch.articles:
article = Article(sub_article.url, config=config,
memoize_articles=False, language='en')
article.download()
article.parse()
if article.url not in article_urls:
article_urls.add(article.url)
return article_urls
You're welcome @Aliktk. Happy coding.
hello everyone I am scrapping a website to get the latest tab news. I followed the code and at first it worked perfectly but after code cleaning and running again it returned nothing at all? Here is the code:
Initial Tab Link Scraping and then pass the link to build newspaper object for getting the articles links:
It returns me:
Is there any limit that you can't refresh the links for scraping or something else wrong? Thank you