c4software / python-sitemap

Mini website crawler to make sitemap from a website.
GNU General Public License v3.0
366 stars 110 forks source link

Slash missing in URL #25

Closed ghost closed 7 years ago

ghost commented 7 years ago

Running: python3 main.py --domain https://www.2globalnomads.info --output sitemap.xml --images --report --parserobots

Output: <image:loc>https://www.2globalnomads.infopaivi-santeri-kannisto/subscribe.png</image:loc></image:image><image:image><image:loc>https://www.2globalnomads.infopaivi-santeri-kannisto/logo.png</image:loc>

There should be "/" in the URL before path, between "info" and "paivi" like this "info/paivi"

The same issue happens with all local URLs. Remote URLs are all OK.

c4software commented 7 years ago

Nice catch! I will fix it tonight

c4software commented 7 years ago

Fixed. I also notice some glitch while parsing your website (the //analytics… and the mailto link) its also fixed.

ghost commented 7 years ago

Works, great! One more issue left with data URI images that should be excluded and an improvement proposal. I will open a new ticket for those.