alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.16k stars 648 forks source link

Autoscrapper for scrapping dynamic websites! #67

Closed Sreetama2001 closed 2 years ago

Sreetama2001 commented 2 years ago

Hello! @alirezamika I wanted to ask what to do if the text in the wanted list is no longer available in the website !

from autoscraper import AutoScraper

scrapper=AutoScraper()

scrape_data1=[
    ("https://theprint.in",["Japan envoy holds talks with senior Taliban members in Kabul"]),
    ("https://theprint.in",["India records 9,119 new Covid cases, active infections lowest in 539 days"]),
    ("https://theprint.in",["Farm laws debate missed a lot. Neither supporters nor Modi govt identified the real problem"]),
    ("https://theprint.in",["Punjab’s Dalits are shifting state politics, flocking churches, singing Chamar pride"]),
]
for get_url,data in scrape_data1:
    scrapper.build(url=get_url,wanted_list=data,update=True)
    Main_news=scrapper.get_result_similar(url="https://theprint.in",grouped=True, group_by_alias=True,unique=True)

print(Main_news)

Here in the above code scrapes a news website but if i run it after few hours when the news get updated , scrapper returns {} or someting else related to the text found! i mean i want to know how to optimize the code for dynamic websites where the text gets updated !

alirezamika commented 2 years ago

Hey! You should build the scraper once, save it and use it after that. You don't need to build it every time!