For most messages, it just silently garbles the commit message. E.g. hg commit ä => git commit ä. But for some characters in the commit message (“/”) the process crashes as follows:
Traceback (most recent call last):
File "/<REDACTED>/hg-fast-export.py", line 737, in <module>
File "/<REDACTED>/hg-fast-export.py", line 583, in hg2git
plugins)
File "/<REDACTED>/hg-fast-export.py", line 297, in export_commit
(revnode,_,user,(time,timezone),files,desc,branch,extra)=get_changeset(ui,repo,revision,authors,encoding)
File "/<REDACTED>/hg2git.py", line 97, in get_changeset
desc=desc.decode(encoding).encode('utf8')
File "/usr/lib/python2.7/encodings/cp1252.py", line 15, in decode
return codecs.charmap_decode(input,errors,decoding_table)
UnicodeDecodeError: 'charmap' codec can't decode byte 0x9d in position 65: character maps to <undefined>
When inspecting the encoding, the following debug message prints Encoding: cp1252 for -e utf8. I expected it to print Encoding: utf8. When called without the -e parameter, it again prints Encoding: cp1252, I expected it to print Encoding: in this case.
diff --git a/hg-fast-export.py b/hg-fast-export.py
index 93f35bf..4324c6c 100755
--- a/hg-fast-export.py
+++ b/hg-fast-export.py
@@ -695,6 +695,7 @@ if __name__=='__main__':
encoding=''
if options.encoding!=None:
encoding=options.encoding
+ stderr_buffer.write(b"Encoding: %s\n" % encoding)
fn_encoding=encoding
if options.fn_encoding!=None:
Environment: Windows 10 WSL (Ubuntu 20.04).
Commit messages are decoded as cp1252 when calling with the following parameters:
For most messages, it just silently garbles the commit message. E.g. hg commit
ä
=> git commitä
. But for some characters in the commit message (“
/”
) the process crashes as follows:When inspecting the encoding, the following debug message prints
Encoding: cp1252
for-e utf8
. I expected it to printEncoding: utf8
. When called without the-e
parameter, it again printsEncoding: cp1252
, I expected it to printEncoding:
in this case.