c4software / python-sitemap

Mini website crawler to make sitemap from a website.
GNU General Public License v3.0
362 stars 110 forks source link

UnicodeDecodeError possibly with Scandinavian letters #35

Open ghost opened 7 years ago

ghost commented 7 years ago

Command: python3 ~/sitemap/python-sitemap-master/main.py --domain https://www.xetnet.fi --image --output sitemap.xml --verbose Output:

INFO:root:Start the crawling process INFO:root:Crawling #1: https://www.xetnet.fi INFO:root:Crawling #2: https://www.xetnet.fi/category/ror/ INFO:root:Crawling #3: https://www.xetnet.fi/wordpress-asennus-webhotelliin-2/ INFO:root:Crawling #4: https://www.xetnet.fi/category/ruby/ INFO:root:Crawling #5: https://www.xetnet.fi/asiakaspalvelu/reilua-palvelua/ INFO:root:Crawling #6: https://www.xetnet.fi/webhotelli/wordpress-webhotelli/ INFO:root:Crawling #7: https://www.xetnet.fi/wordpress/ INFO:root:Crawling #8: https://www.xetnet.fi/palvelupaketin-vaihtaminen-suurempaan-tai-pienempaan/ Traceback (most recent call last): File "/home/paivisanteri/sitemap/python-sitemap-master/main.py", line 53, in crawl.run() File "/home/paivisanteri/sitemap/python-sitemap-master/crawler.py", line 101, in run self.crawling() File "/home/paivisanteri/sitemap/python-sitemap-master/crawler.py", line 205, in crawling print (""+self.htmlspecialchars(url.geturl())+"" + lastmod + image_list + "", file=self.output_file) UnicodeEncodeError: 'ascii' codec can't encode character '\xe4' in position 745: ordinal not in range(128)

Can this be somehow local problem or maybe in my python settings? I am not familiar with python.