ahupp / python-magic

A python wrapper for libmagic
Other
2.59k stars 280 forks source link

magic.from_file() fails for files with German umlauts in their name although Windows 10 permits such filenames #289

Closed schwabts closed 1 year ago

schwabts commented 1 year ago

In my current use case I simply used the standard of replacing ü by ue so my app continues to run without issues for now.

For

testfile = "2023-05-10 Q12345678 schon wieder zu früh.csv"
mime_type = Magic.from_file(testfile)
self.log.info(f'Found mime_type="{mime_type}" of {testfile=}')

I got

2023-05-10 15:32:45,501 - MyClass - INFO - Found mime_type="cannot open `C:\long\path\to\project\data\2023-05-10, Mi(19), Q12345678\2023-05-10 Q12345678 schon wieder zu fr\303\274h.csv' (No such file or directory)" of testfile='C:\\long\\path\\to\\project\\data\\2023-05-10, Mi(19), Q12345678\\2023-05-10 Q12345678 schon wieder zu früh.csv'

Writing this down exposed it's really the umlaut in the file name because of which processing the file breaks.

Test file attached: 2023-05-10 Q12345678 schon wieder zu früh.csv but it seems to me GitHub already replaced the umlaut by u.

I have not tested characters like œ, ő, å, š, or 最好名字, though.

Python 3.8.8 Windows 10

ahupp commented 1 year ago

Dupe of https://github.com/ahupp/python-magic/issues/287