Stability Enhancement - Replace selection by class name

GeminidSystems / GoogleNewsScraper

A Python package that scrapes Google News article data while remaining undetected by Google. Our scraper can scrape page data up until the last page and never trigger a CAPTCHA (download stats: https://pepy.tech/project/GoogleNewsScraper)

MIT License

11 stars 5 forks source link

@abnoviello23

For stability reasons, we want to replace the use of find_elements_by_class_name or div[contains(@class)], google changes the class names regularly and it breaks our script

We should be able to select what we need using one of the following

select by id (preferred method as this is unlikely to change)
select by tag name (example <img/> for the image_url and <a/> for the url for sure can be used)
select by tag position (for example we know the text content we want is under a > div > div > [div,div,div] (the 3 divs each contain the source, title, and description we need)

GeminidSystems / GoogleNewsScraper

Stability Enhancement - Replace selection by class name #2