frej / fast-export

A mercurial to git converter using git-fast-import
http://repo.or.cz/w/fast-export.git
808 stars 255 forks source link

Resolve unicode escape sequences not being processed correctly #293

Closed chrisjbillington closed 1 year ago

chrisjbillington commented 1 year ago

In process_unicode_escape_sequences(), any backslash escape sequences in the original string are escaped upon the first .encode('unicode-escape') and therefore round-trip the sequence of .encode('unicode-escape').decode('unicode-escape').

That is not what we want - we want these sequences to be passed-through the .encode unchanged, so that they will be converted to the character they represent upon .decode().

This patch changes the .encode() step to pass through any ascii characters unchanged, only escaping non-ascii characters. This ensures any existing backslash escape sequences will be interpreted as the character they represent upon .decode().

Tested on Python 2.7, 3.6 and 3.10

frej commented 1 year ago

You are an ideal contributor @chrisjbillington, thanks a lot!