dsoprea / PySvn

Lightweight Subversion library for Python.
GNU General Public License v2.0
217 stars 145 forks source link

PySvn decodes unicode in log messages incorrectly on Windows #154

Open GardenTools opened 4 years ago

GardenTools commented 4 years ago

PySvn incorrectly decodes the text of log messages on windows, either resulting in junk characters or an exception.

Example, create an svn commit with the content “some words” the quotes area U+201C and a U+201D (RIGHT DOUBLE QUOTATION MARK).

The result is an exception:

Traceback (most recent call last): File "F:\Program Files\Python37\lib\threading.py", line 870, in run self._target(*self._args, *self._kwargs) File "F:\Program Files\Python37\lib\subprocess.py", line 1238, in _readerthread buffer.append(fh.read()) File "F:\Program Files\Python37\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 161: character maps to Exception in thread Thread-13: Traceback (most recent call last): File "F:\Program Files\Python37\lib\threading.py", line 926, in _bootstrap_inner self.run() File "F:\Program Files\Python37\lib\threading.py", line 870, in run self._target(self._args, **self._kwargs) File "F:\Program Files\Python37\lib\subprocess.py", line 1238, in _readerthread buffer.append(fh.read()) File "F:\Program Files\Python37\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 161: character maps to

Traceback (most recent call last): File "F:\Program Files\Python37\lib\subprocess.py", line 939, in communicate stdout, stderr = self._communicate(input, endtime, timeout) File "F:\Program Files\Python37\lib\subprocess.py", line 1288, in _communicate stdout = stdout[0] IndexError: list index out of range

In the call to subprocess.Popen() there is the following self.stdout = io.open(c2pread, 'rb', bufsize) if self.text_mode: self.stdout = io.TextIOWrapper(self.stdout, encoding=encoding, errors=errors)

Normally here for PySvn text_mode is True and encoding is None, this results in a call to getpreferredencoding() which returns the system encoding (for me this is 'cp1252' ). Note that this is not the same as sys.getdefaultencoding(), which is "utf-8". communicate() then returns the text from svn decoded using cp1252 and not utf-8. The byte sequence is b'\xe2\x80\x9d' and x\9d is an invalid character in cp1252.