c4software / python-sitemap

Mini website crawler to make sitemap from a website.
GNU General Public License v3.0
362 stars 110 forks source link

Double entry if 2 slashes in the url #38

Closed ghost closed 7 years ago

ghost commented 7 years ago

Command: python3 ~/sitemap/python-sitemap-master/main.py --domain https://www.2globalnomads.info//web-design-websites/ --image --output sitemap.xml --report This location will appears twice in the sitemap because of the double slash:

\https://www.2globalnomads.info//web-design-websites/<\/loc>

c4software commented 7 years ago

The best way to resolve this issue is (i think) to check the domain before running the command.

ghost commented 7 years ago

I believe it should ignore all double slashes and treat them as one slash to function properly except in the protocol.

ghost commented 7 years ago

I checked it. The correct solution is search from the path all double slashes and replace them with single slashes, and do this as many times as there are no more double slashes left. A slash is an empty, non-existent directory which is like null,