Closed ghost closed 7 years ago
Same issue with scandinavian letters here:
python3 ~/sitemap/python-sitemap-master/main.py --domain https://www.xetnet.fi --image --output sitemap.xml --verbose
For the first website, its because of ebook, the crawler open the « target » uri, but the content is not really navigable so its fail.
A fix will be included in the next commit.
For the « xetnet.fi », i start the crawler right now i will check this case when the error will appear.
OK. Those are likely due to Scandinavian special letters such as ö and ä. The character encoding in URLs is crucial to be unicode for handling them.
I don’t have the issue anymore with the « xetnef.fi » website.
I also test with french letters « é, à, ô, … » its seems ok.
Command
python3 ~/sitemap/python-sitemap-master/main.py --domain https://www.books.2globalnomads.info --image --output sitemap.xml
OutputUnicodeDecodeError: 'utf-8' codec can't decode byte 0x92 in position 1: invalid start byte
With multiple errors: HTTP Error 404: Not Found