alirezamika / autoscraper

A Smart, Automatic, Fast and Lightweight Web Scraper for Python
MIT License
6.16k stars 648 forks source link

HTML Parameter #42

Closed j3vr0n closed 3 years ago

j3vr0n commented 3 years ago

I read a previous post that mentioned capability for the HTML parameter, in which I could render a JS application using another tool (BS or Selenium) and pass in the HTML data for AutoScraper to parse. Does anyone have steps or documentation on how to use this parameter?

go-delicious commented 3 years ago

I read a previous post that mentioned capability for the HTML parameter, in which I could render a JS application using another tool (BS or Selenium) and pass in the HTML data for AutoScraper to parse. Does anyone have steps or documentation on how to use this parameter?

In the first example it says you can parse the html instead of the URL. https://github.com/alirezamika/autoscraper#getting-exact-result

Just do a request with selenium etc, and return the html. Then put it in there.

j3vr0n commented 3 years ago

Awesome, I have a third party scraping tool that I actively use and am looking to embed this Python code as part of my jobs in order to have better "self-healing" measures for website changes from the HTML.