ahupp / python-magic

A python wrapper for libmagic
Other
2.59k stars 280 forks source link

UnicodeDecodeError when filename includes non ASCII characters #287

Open davuses opened 1 year ago

davuses commented 1 year ago

trying to read from a file whose filename is not ascii characters:

magic.from_file("説明.txt")

And this gives me error:

Traceback (most recent call last):
  File "G:\BaiduNet\unarchive.py", line 64, in <module>
    magic.from_file("説明.txt")
  File "C:\Users\davuses\AppData\Local\Programs\Python\Python311\Lib\site-packages\magic\magic.py", line 135, in from_file
    return m.from_file(filename)
           ^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\davuses\AppData\Local\Programs\Python\Python311\Lib\site-packages\magic\magic.py", line 89, in from_file
    return maybe_decode(magic_file(self.cookie, filename))
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "C:\Users\davuses\AppData\Local\Programs\Python\Python311\Lib\site-packages\magic\magic.py", line 214, in maybe_decode
    return s.decode('utf-8')
           ^^^^^^^^^^^^^^^^^
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe6 in position 16: invalid continuation byte

If I rename the file to ASCII name, say file.txt, the problem disappears.

Also, if I use .from_buffer(), there's no issue:

magic.from_buffer(open("説明.txt", "rb").read(2048), mime=True)

weird, not sure if this is related to this issue #205

The package is installed with pip install python-magic-bin on WIndows 11, Python3.11

silente commented 1 year ago

Hi, I have the same problem.

My code is:

magic.from_file(file_path, mime=True)

My error is:

  File "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 135, in from_file
    return m.from_file(filename)
  File "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 89, in from_file
    return maybe_decode(magic_file(self.cookie, filename))
  File "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 214, in maybe_decode
    return s.decode('utf-8')
UnicodeDecodeError: 'utf-8' codec can't decode byte 0xe2 in position 57: invalid continuation byte

I tried to edit "C:\Program Files\Python38\lib\site-packages\magic\magic.py", line 214 from return s.decode('utf-8') to return s.decode('utf-8', errors='ignore') or return s.decode('utf-8', errors='replace') but I still encounter the problem.