Open owenson opened 5 years ago
Why do parameters need to be reordered?
/test?a=1&b=2 and /test?b=1&a=2 point to the same thing
@owenson this depends on the webapp, it can distinguish these cases and return different results for the two urls.
Also in the first url contains a=1
and b=2
but in the second a=2
and b=1
, i assume this is just a typo.
Yes it was a typo. I've yet to see a web app which cares about order, or indeed any Web frameworks that let you get that info. Most crawlers will normalise the query parameters in this way. Nutch does it, scrapy does it, etc.
I'm open to adding an option to normalize url. Would you like to work on this?
I've used this in the past.
Great, thanks!
Colly does not currently appear to do any URL normalization. For example querystring parameters need to be reordered in alphabetical order, host lowercased, etc.
See https://github.com/PuerkitoBio/purell