Scraping a web page with multiple links to specific sections on another page (i.e. "/#section") results in duplicate downloads of the same page because the URL fragment is different. According to the fragment method docs this portion of the URL isn't typically sent to the server, and in my understanding it would only make a difference in client-side updates that wouldn't be tracked here anyway.
Should the URL fragment be removed from the URL to avoid duplicates? If so I can put in a PR for that
Scraping a web page with multiple links to specific sections on another page (i.e. "/#section") results in duplicate downloads of the same page because the URL fragment is different. According to the
fragment
method docs this portion of the URL isn't typically sent to the server, and in my understanding it would only make a difference in client-side updates that wouldn't be tracked here anyway.Should the URL fragment be removed from the URL to avoid duplicates? If so I can put in a PR for that