Clonkex / win32lfn-python3

1 stars 0 forks source link

Mercurial error on Update #1

Open Ellerbrok opened 8 months ago

Ellerbrok commented 8 months ago

Hi there,

there seems to be an issue with utf-8 in here. After installling the extension to Mercurial I get the following error message if I try to "Update" to a newer Revision of my repository.

** Mercurial version (6.4.2).  TortoiseHg version (6.4.2)
** Command: 
** CWD: C:\Program Files\TortoiseHg
** Encoding: cp1252
** Extensions loaded: mercurial_keyring unknown, rebase, strip, tortoisehg.util.configitems, win32lfn
** Python version: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
** Windows version: sys.getwindowsversion(major=6, minor=2, build=9200, platform=2, service_pack='')
** Processor architecture: x64
** Qt-5.15.2 PyQt-5.15.7 QScintilla-2.13.3
Traceback (most recent call last):
  File "tortoisehg\hgqt\cmdui.pyc", line 649, in runCommand
  File "tortoisehg\hgqt\update.pyc", line 398, in runCommand
  File "tortoisehg\hgqt\update.pyc", line 342, in isclean
  File "mercurial\context.pyc", line 1460, in modified
  File "mercurial\util.pyc", line 1760, in __get__
  File "mercurial\context.pyc", line 1425, in _status
  File "mercurial\localrepo.pyc", line 3388, in status
  File "mercurial\context.pyc", line 432, in status
  File "mercurial\context.pyc", line 2001, in _buildstatus
  File "mercurial\context.pyc", line 1906, in _dirstatestatus
  File "mercurial\dirstate.pyc", line 1681, in status
  File "mercurial\dirstate.pyc", line 1505, in walk
  File "mercurial\windows.pyc", line 599, in statfiles
  File "C:/Program Files/TortoiseHg/win32lfn.py", line 116, in fn
    path = stringtobytes(uncabspath(args[0]))
  File "C:/Program Files/TortoiseHg/win32lfn.py", line 97, in uncabspath
    path = bytestostring(path)
  File "C:/Program Files/TortoiseHg/win32lfn.py", line 377, in bytestostring
    string = string.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdc in position 43: invalid continuation byte
Ellerbrok commented 8 months ago

I also tried the most recent Mercurial version:

** Mercurial version (6.5.1).  TortoiseHg version (6.5.1)
** Command: 
** CWD: C:\Program Files\TortoiseHg
** Encoding: cp1252
** Extensions loaded: mercurial_keyring unknown, rebase, strip, tortoisehg.util.configitems, win32lfn
** Python version: 3.9.13 (tags/v3.9.13:6de2ca5, May 17 2022, 16:36:42) [MSC v.1929 64 bit (AMD64)]
** Windows version: sys.getwindowsversion(major=6, minor=2, build=9200, platform=2, service_pack='')
** Processor architecture: x64
** Qt-5.15.2 PyQt-5.15.7 QScintilla-2.13.3
Traceback (most recent call last):
  File "tortoisehg\hgqt\cmdui.pyc", line 649, in runCommand
  File "tortoisehg\hgqt\update.pyc", line 398, in runCommand
  File "tortoisehg\hgqt\update.pyc", line 342, in isclean
  File "mercurial\context.pyc", line 1460, in modified
  File "mercurial\util.pyc", line 1760, in __get__
  File "mercurial\context.pyc", line 1425, in _status
  File "mercurial\localrepo.pyc", line 3408, in status
  File "mercurial\context.pyc", line 432, in status
  File "mercurial\context.pyc", line 2001, in _buildstatus
  File "mercurial\context.pyc", line 1906, in _dirstatestatus
  File "mercurial\dirstate.pyc", line 1681, in status
  File "mercurial\dirstate.pyc", line 1505, in walk
  File "mercurial\windows.pyc", line 599, in statfiles
  File "C:/Program Files/TortoiseHg/win32lfn.py", line 116, in fn
    path = stringtobytes(uncabspath(args[0]))
  File "C:/Program Files/TortoiseHg/win32lfn.py", line 97, in uncabspath
    path = bytestostring(path)
  File "C:/Program Files/TortoiseHg/win32lfn.py", line 377, in bytestostring
    string = string.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xdc in position 43: invalid continuation byte
Clonkex commented 8 months ago

Hmm. I guess something isn't encoded as utf-8 in your repo that was in mine 😕 I wonder if it could be related to being on Windows 8 🤔 What happens if you change line 377 from string = string.decode('utf-8') to string = string.decode('latin-1')? It's been a while since I worked on this and I never fully understood it to begin with so I can't say whether that's likely to work, but it's worth a shot.

If that works, or if that at least changes the error, we might need to change that part to try decoding as utf-8 and if that fails decode as something else. Or maybe Python has a way to properly detect the encoding of a string, if such a thing is possible. I'm not actually sure what data is being passed to that function, so it's a bit tricky to know what it should be doing exactly.

Or, if you have a Windows 10 box it might be worth testing whether your repo and this extension works there. My suspicion is that Windows 10 may be handling things as unicode where Windows 8 still returned directory listings in older encodings, or something along those lines.

Ellerbrok commented 8 months ago

Hi, in fact this is Windows 11. Maybe somthing in the Repository is utf16?

I found something that might help, but I have not testet this in the py file because I have no experience with Python.

def force_decode(string, codecs=['utf8', 'cp1252', 'latin-1', 'utf16' ]): for i in codecs: try: return string.decode(i) except UnicodeDecodeError: pass

for item in os.listdir(rootPath):

Convert to Unicode

if isinstance(item, str):
    item = force_decode(item)
print item
Clonkex commented 8 months ago

How strange! The log is reporting Windows 8 (version 6.2 is Windows 8, as is build 9200).

Ok, try changing this bit at line 375:

def bytestostring(string):
    if isinstance(string, bytes):
        string = string.decode('utf-8')
    return string

...to this:

def bytestostring(string):
    if isinstance(string, bytes):
        string = force_decode(string)
    return string

def force_decode(string, codecs=['utf8', 'cp1252', 'latin-1', 'utf16' ]):
    for i in codecs:
        try:
            return string.decode(i)
        except UnicodeDecodeError:
            pass

...and see if that helps. I have no real experience in Python except for the occasional Blender script so I can't say if this is right, but I think it should work.