Closed afuetterer closed 2 years ago
I tried to use the ignore_regex setting without success.
I wanted to ignore url patterns like these: https://www.spiegel.de/dienste/ https://www.spiegel.de/extra/
I tried
"ignore_regex" : "dienste"
"ignore_regex" : "extra"
"ignore_regex" : "(\/dienste\/)|(\/extra\/)"
I still end up with those URLs that match the regex being crawled.
sitelist.hjson
contains "spiegel.de" to apply several settings per site.config.cfg
is just a copy of the original file that is delivered with news-please.Related issues: #5, #9