ahupp / python-magic

A python wrapper for libmagic
Other
2.59k stars 280 forks source link

Segmentation fault when attempting to load `msys-magic-1.dll` from Git SCM #288

Closed jdknight closed 1 year ago

jdknight commented 1 year ago

Running Python from within a Windows Git SCM shell (MINGW64; which includes file support), it appears the import of the magic module would always fail to load. The typical error would be a Segmentation fault; however, sometimes the error state would vary (either hanging forever or another error). This script (just the import):

import magic

Tweaked the magic library to see where it was failing:

    print('lib', lib, os.path.isfile(lib), flush=True)
    try:
      return ctypes.CDLL(lib)
    except OSError:
      pass
    print('failed', flush=True)
Segmentation fault ``` $ python ./my-script lib ./libmagic.dll False failed lib ./magic1.dll False failed lib ./cygmagic-1.dll False failed lib ./libmagic-1.dll False failed lib ./msys-magic-1.dll False failed lib C:\Program Files\Git\usr\bin\msys-magic-1.dll True Segmentation fault ```
Internal error ``` $ python ./my-script lib ./libmagic.dll False failed lib ./magic1.dll False failed lib ./cygmagic-1.dll False failed lib ./libmagic-1.dll False failed lib ./msys-magic-1.dll False failed lib C:\Program Files\Git\usr\bin\msys-magic-1.dll True 0 [main] python 1822 C:\Program Files\Python311\python.exe: *** fatal error - Internal error: TP_NUM_C_BUFS too small: 50 479 [main] python 1822 cygwin_exception::open_stackdumpfile: Dumping stack trace to python.exe.stackdump ```

After installing the python-magic-bin package, this module worked as expected.

Loaded up a real MinGW-w64 environment, installed file (and manually deleted their broken magic.py), and installed python-magic — the magic module worked as expected.


Originally, I assumed msys-magic-1.dll library provided by Git SCM is built in a way which is invalid for the Python interpreter to load. Checking to make sure it wasn't a specific interpreter version, I made a simple DLL load script to see if the DLL could be loaded and tested on a couple of interpreters (3.7, 3.8, 3.10 and 3.11):

import ctypes
lib = r'C:\Program Files\Git\usr\bin\msys-magic-1.dll'
print('lib', lib, flush=True)
print(ctypes.CDLL(lib))

From these checks, it looked like all interpreters were able to load the DLL:

lib C:\Program Files\Git\usr\bin\msys-magic-1.dll
<CDLL 'C:\Program Files\Git\usr\bin\msys-magic-1.dll', handle 4b3930000 at 0x271234aa8d0>

Hacking around, it looks that I was able to use the magic module in this environment (i.e. without python-magic-bin) by doing two steps:

  1. Commenting out _add_compat(globals()) from python-magic's __init__.py script; and,
  2. Manually loading the DLL before attempting to load the magic module:
import ctypes
ctypes.CDLL(r'C:\Program Files\Git\usr\bin\msys-magic-1.dll')
import magic

Granted, these changes are not really good for long term. Had to stop investigating after this. Don't know if anyone has any suggestions/comments on this. I'll try to find time later to continue this investigation (unless someone else beats me to it).


(python-magic 0.4.27, Git SCM 2.40.1, Windows 22H2)

ahupp commented 1 year ago

Can you explicitly setting PATH to control where the DLL is being searched for? I'm not sure what the right solution is here other than avoiding loading the library entirely.

ahupp commented 1 year ago

Merging into https://github.com/ahupp/python-magic/issues/293