Url detect error while using cache

bda-research / node-webcrawler

Crawler is a web spider written with Nodejs. It gives you the full power of jQuery on the server to parse a big number of pages as they are downloaded, asynchronously

MIT License

41 stars 7 forks source link

Url detect error while using cache #1

Closed mike442144 closed 9 years ago

mike442144 commented 9 years ago

These urls should be same: http://www.google.com/q?x=1&y=2 http://www.google.com/q?y=2&x=1 http://www.google.com:80?x=1&y=2 but crawler will treated three different urls.

mike442144 commented 9 years ago

fixed in 0.5.0