curran / google-diff-match-patch

Automatically exported from code.google.com/p/google-diff-match-patch
Apache License 2.0
17 stars 2 forks source link

Python UnicodeDecodeError when using Cyrillic (not ascii?) chars #10

Closed GoogleCodeExporter closed 9 years ago

GoogleCodeExporter commented 9 years ago
[code]
# -*- coding: utf-8 -*-

from diff_match_patch import diff_match_patch

dmp = diff_match_patch()

str1 = """Привет!"""
str2 = """Привет and Welcome!"""

patches = dmp.patch_make(str1, str2)
#print dmp.patch_toText(patches)

print dmp.patch_apply(patches, str1)[0]
[\code]

$ python dmp.py
Traceback (most recent call last):
  File "dmp.py", line 14, in <module>
    print dmp.patch_apply(patches, str1)[0]
  File "/data/Coding/Python/diff_match_patch.py", line 1401, in patch_apply
    text = nullPadding + text + nullPadding
UnicodeDecodeError: 'ascii' codec can't decode byte 0xd0 in position 0:
ordinal not in range(128)

Original issue reported on code.google.com by sashul...@gmail.com on 15 May 2008 at 6:44

GoogleCodeExporter commented 9 years ago
Thank you for taking the time to explain what's going on.  The root issue is 
that you
are using high-ascii non-Unicode strings, something Python does not like.  I 
have
managed to fix this particular incompatibility in patch_apply.  However there 
are two
other incompatibilities in str(patch) and diff_toDelta() which I can't fix 
because
they would break the Unicode support.

Just so you know, the proper way to do this is:
str1 = u"""Привет!"""
str2 = u"""Привет and Welcome!"""

Adding the extra 'u' makes it a Unicode string in Python.  That works perfectly 
and
is portable across all systems.  Leaving out the 'u' means it is using high 
ascii,
which is ambiguous -- Python doesn't know what the characters are supposed to 
represent.

Hope this helps.  I've uploaded a new version with the fix to patch_apply, as 
well as
a couple of other unrelated fixes which I found while looking at your issue.

Original comment by neil.fra...@gmail.com on 20 May 2008 at 4:34

GoogleCodeExporter commented 9 years ago
it's works! thanks a lot!

Original comment by sashul...@gmail.com on 25 May 2008 at 6:31