MKuranowski / pyroutelib3

Simple routing over OpenStreetMap data
https://pyroutelib3.readthedocs.io/
GNU General Public License v3.0
79 stars 24 forks source link

UnicodeDecodeError on Windows with non-UTF8 preferred encoding #1

Closed deck34 closed 7 years ago

deck34 commented 7 years ago

Hello. I use Python 3.6.2 on Windows 10 (with non-English localization).

 status, route = router.doRoute(start, end)  # Find the route - a list of OSM nodes
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\pyroutelib3\__init__.py", line 341, in doRoute
    self._addToQueue(x,i,nextItem, weight)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\pyroutelib3\__init__.py", line 356, in _addToQueue
    self.data.getArea(end_pos[0], end_pos[1])
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\pyroutelib3\__init__.py", line 83, in getArea
    return(self.loadOsm(filename))
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\pyroutelib3\__init__.py", line 166, in loadOsm
    data = self.parseOsmFile(filename)
  File "C:\Program Files (x86)\Python36-32\lib\site-packages\pyroutelib3\__init__.py", line 128, in parseOsmFile
    for event, elem in etree.iterparse(f): # events=['end']
  File "C:\Program Files (x86)\Python36-32\lib\xml\etree\ElementTree.py", line 1223, in iterator
    data = source.read(16 * 1024)
  File "C:\Program Files (x86)\Python36-32\lib\encodings\cp1251.py", line 23, in decode
    return codecs.charmap_decode(input,self.errors,decoding_table)[0]
UnicodeDecodeError: 'charmap' codec can't decode byte 0x98 in position 13229: character maps to <undefined>

Code fails 'cos Windows has cp1251 as so-called locale preferred encoding but my schema has utf-8. In ElementTree.parse used source = open(source, "rb") , if rewrite open() to codecs.open(source, 'rb', 'utf-8') maybe solved this issue.

MKuranowski commented 7 years ago

Even though I can't reproduce it, I think I see a possible solution.

In pyroutelib3/__init__.py, function parseOsmFile, a line higher from the error, is the actual file's open(), function. I will add the encoding="utf-8" parameter later today, hope this solves this problem.

MKuranowski commented 7 years ago

Should be fixed with dc6d515, in version 0.4. Can you upgrade and check if this works now?