Leibniz-HBI / newsfeedback

Tool for extracting and saving news article metadata (and optionally content) at regular intervals.
MIT License
3 stars 0 forks source link

Bypass "Pur Abo" Barriers by clicking the accept button #9

Closed rwinterschlaf closed 1 year ago

rwinterschlaf commented 1 year ago

A few pages do not load content until the visitors has granted permission to use their data in exchange for free site usage (Pur Abo) - use selenium to click on the button and move past this restriction.

Ideas: go via driver.find_element(By.LINK_TEXT, "AKZEPTIEREN UND WEITER"), but needs to be sufficiently tested and adjusted accordingly due for different cases (literally: upper- and lowercase, but also different URLs). The former can be sorted out via regex, surely, the latter adjusted accordingly.