alephdata / memorious

Lightweight web scraping toolkit for documents and structured data.
https://docs.alephdata.org/developers/memorious
MIT License
311 stars 59 forks source link

normalize_url: False does not disable URL normalization #87

Closed moreymat closed 4 years ago

moreymat commented 4 years ago

memorious.logic.http calls normalizer.normalize_url that drops query arguments with no value from URLs. Specifying normalize_url: False in the YAML configuration file only prevents the first application of normalize_url in ContextHttp.request(), but does not prevent its second application in ContextHttpResponse.url(). This breaks crawling on a number of websites.

pudo commented 4 years ago

Fixed now.