MilesCranmer / PySR

High-Performance Symbolic Regression in Python and Julia
https://ai.damtp.cam.ac.uk/pysr
Apache License 2.0
2.44k stars 217 forks source link

[BUG]: Hard crash on import from MacOS System Integrity Protection (SIP) #682

Closed ev-watson closed 3 months ago

ev-watson commented 4 months ago

What happened?

upon pip installing pysr into a virtual environment, making sure my PATH variable has the bin, exporting LD_LIBRARY_PATH as specified in github readme, and even removing quarantine status for the environment, importing pysr still results in python quitting

julia version supports arch64 (silicon)

Version

Any version of PySR

Operating System

macOS

Package Manager

pip

Interface

Jupyter Notebook

Relevant log output

-------------------------------------
Translated Report (Full Report Below)
-------------------------------------

Process:               Python [40891]
Path:                  /Library/Developer/CommandLineTools/Library/Frameworks/Python3.framework/Versions/3.9/Resources/Python.app/Contents/MacOS/Python
Identifier:            com.apple.python3
Version:               3.9.6 (3.9.6)
Build Info:            python3-141000000000000~1415
Code Type:             ARM-64 (Native)
Parent Process:        python [40765]
Responsible:           pycharm [40727]
User ID:               501

Date/Time:             2024-07-27 21:46:46.1280 -0700
OS Version:            macOS 14.5 (23F79)
Report Version:        12
Anonymous UUID:        6F31D97B-2A3B-8D95-FA9E-B1FE5CB86DF1

Sleep/Wake UUID:       404515B4-A7B3-4531-A2F4-F7C17B16EC40

Time Awake Since Boot: 240000 seconds
Time Since Wake:       27250 seconds

System Integrity Protection: enabled

Crashed Thread:        0  Dispatch queue: com.apple.main-thread

Exception Type:        EXC_GUARD (SIGKILL)
Exception Codes:       GUARD_TYPE_MACH_PORT
Exception Codes:       0x0000000000012740, 0x0000000000000000

Termination Reason:    Namespace GUARD, Code 2305843035917854528

Extra Info

tried all sorts of PySR and Julia versions, this seems to be independent of that, id prefer a solution that doesnt involve me booting in RecoveryOS and disabling SIP, although this is what I have done in the meantime

ev-watson commented 4 months ago

just to be clear, booting into RecoveryOS and disabling the security protocol DOES solve this issue (I was able to fully regress an equation), so surely there is a fix that can be made, maybe permissions can be set somewhere or something

MilesCranmer commented 4 months ago
  1. Can you launch Julia by itself?
  2. Then, can you import juliacall by itself, or does that trigger the error too?
  3. Do you have any corporate security tools that might be causing this? Note that PySR attempts to install Julia the first time you import it.

If (1) is okay, but (2) is not, perhaps you could try forcing the version of Julia it is loading? You can do this with the environment variables here: https://juliapy.github.io/PythonCall.jl/stable/juliacall/#julia-config

ev-watson commented 4 months ago

I can launch julia just fine, i can also import juliacall by itself, once I did this, i then imported pysr and everything worked just fine If i restart my jupyter server and import pysr alone it crashes, but if i import juliacall before pysr everything works and i can regress an equation. For clarity this works:

import juliacall
import pysr

But this causes the security protocol to be triggered:

import pysr

i do not have any corporate security tools, this was just Apple's own disk System Integrity Protection force killing python. So obviously it is because pysr cant install julia, or at least right now it just does so in way that triggers SIP,

Thanks for the help,

MilesCranmer commented 4 months ago

but if i import juliacall before pysr everything works and i can regress an equation

Interesting. So the only things that this could change are the environment variables set within pysr/julia_import.py. I wonder if you set those manually, if any environment variable in particular triggers the error? Perhaps the multithreaded setting is causing it?

e.g., https://github.com/MilesCranmer/PySR/blob/3aee19e38ceb3e0e1617d357a831400e01204658/pysr/julia_import.py#L32-L37

Basically if you import juliacall beforehand, these environment variables in this file will not be used. So maybe one of the variables is causing the issue.

ev-watson commented 4 months ago

Setting manually does not trigger an error, I ran the code provided above without importing pysr and it ran, but then upon importing PySR, even with the environment variables set, it still triggered security protocol

MilesCranmer commented 3 months ago

Can you set the environment variables, and then import juliacall? (i.e., no PySR)

MilesCranmer commented 3 months ago

Another thing to try is using the trace module to see exactly where it gets killed:

python -m trace -t myscript.py 

It should print every line that gets executed. So the last line printed is the one that triggered the system integrity protection

ev-watson commented 3 months ago

Setting environment variables and then importing juliacall did cause a crash, in particular the HANDLE_SIGNALS variable:

So this works (commented out first line):

import os

for k, default in (
        #("PYTHON_JULIACALL_HANDLE_SIGNALS", "yes"),
        ("PYTHON_JULIACALL_THREADS", "auto"),
        ("PYTHON_JULIACALL_OPTLEVEL", "3"),
):
    os.environ[k] = os.environ.get(k, default)

import juliacall

Whereas this triggers security protocol (uncommented first line):

import os

for k, default in (
        ("PYTHON_JULIACALL_HANDLE_SIGNALS", "yes"),
        ("PYTHON_JULIACALL_THREADS", "auto"),
        ("PYTHON_JULIACALL_OPTLEVEL", "3"),
):
    os.environ[k] = os.environ.get(k, default)

import juliacall

Here is the traceback of the crash:

 --- modulename: __init__, funcname: __init__
__init__.py(336):         self._name = name
__init__.py(337):         flags = self._func_flags_
__init__.py(338):         if use_errno:
__init__.py(340):         if use_last_error:
__init__.py(342):         if _sys.platform.startswith("aix"):
__init__.py(350):         if _os.name == "nt":
__init__.py(360):         class _FuncPtr(_CFuncPtr):
 --- modulename: __init__, funcname: _FuncPtr
__init__.py(360):         class _FuncPtr(_CFuncPtr):
__init__.py(361):             _flags_ = flags
__init__.py(362):             _restype_ = self._func_restype_
__init__.py(363):         self._FuncPtr = _FuncPtr
__init__.py(365):         if handle is None:
__init__.py(366):             self._handle = _dlopen(self._name, mode)
__init__.py(183):     argc, argv = args_from_config()
 --- modulename: __init__, funcname: args_from_config
__init__.py(112):         argv = [CONFIG['exepath']]
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(120):                 argv.append('--' + opt[4:].replace('_', '-') + '=' + val)
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(120):                 argv.append('--' + opt[4:].replace('_', '-') + '=' + val)
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(115):                 if val is None:
__init__.py(116):                     if opt == 'opt_handle_signals':
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(114):             if opt.startswith('opt_'):
__init__.py(113):         for opt, val in CONFIG.items():
__init__.py(121):         argv = [s.encode("utf-8") for s in argv]
 --- modulename: __init__, funcname: <listcomp>
__init__.py(121):         argv = [s.encode("utf-8") for s in argv]
__init__.py(121):         argv = [s.encode("utf-8") for s in argv]
__init__.py(121):         argv = [s.encode("utf-8") for s in argv]
__init__.py(121):         argv = [s.encode("utf-8") for s in argv]
__init__.py(123):         argc = len(argv)
__init__.py(124):         argc = c.c_int(argc)
__init__.py(125):         argv = c.POINTER(c.c_char_p)((c.c_char_p * len(argv))(*argv))
__init__.py(126):         return argc, argv
__init__.py(184):     jl_parse_opts = lib.jl_parse_opts
 --- modulename: __init__, funcname: __getattr__
__init__.py(377):         if name.startswith('__') and name.endswith('__'):
__init__.py(379):         func = self.__getitem__(name)
 --- modulename: __init__, funcname: __getitem__
__init__.py(384):         func = self._FuncPtr((name_or_ordinal, self))
__init__.py(385):         if not isinstance(name_or_ordinal, int):
__init__.py(386):             func.__name__ = name_or_ordinal
__init__.py(387):         return func
__init__.py(380):         setattr(self, name, func)
__init__.py(381):         return func
__init__.py(185):     jl_parse_opts.argtypes = [c.c_void_p, c.c_void_p]
__init__.py(186):     jl_parse_opts.restype = None
__init__.py(187):     jl_parse_opts(c.pointer(argc), c.pointer(argv))
__init__.py(188):     assert argc.value == 0
__init__.py(191):     try:
__init__.py(192):         jl_init = lib.jl_init_with_image__threading
 --- modulename: __init__, funcname: __getattr__
__init__.py(377):         if name.startswith('__') and name.endswith('__'):
__init__.py(379):         func = self.__getitem__(name)
 --- modulename: __init__, funcname: __getitem__
__init__.py(384):         func = self._FuncPtr((name_or_ordinal, self))
__init__.py(385):         if not isinstance(name_or_ordinal, int):
__init__.py(386):             func.__name__ = name_or_ordinal
__init__.py(387):         return func
__init__.py(380):         setattr(self, name, func)
__init__.py(381):         return func
__init__.py(195):     jl_init.argtypes = [c.c_char_p, c.c_char_p]
__init__.py(196):     jl_init.restype = None
__init__.py(197):     jl_init(
__init__.py(198):         (default_bindir if bindir is None else bindir).encode('utf8'),
__init__.py(199):         None if sysimg is None else sysimg.encode('utf8'),
__init__.py(197):     jl_init(
zsh: killed     python -m trace -t testing.py
MilesCranmer commented 3 months ago

Thanks, that is super useful! So it sounds like that one environment variable is the issue here. (You can ignore the other two, they seem to not affect it).

Would it be possible for you to make an issue on PythonCall.jl (same thing as juliacall)? Here: https://github.com/JuliaPy/PythonCall.jl/issues

It sounds like an issue is larger than just PySR so we can solve it over there instead. I would just copy the minimal example you mentioned to an issue there (without the other two env variables), and explain how it is triggering system integrity protection issues.

ev-watson commented 3 months ago

Done, thanks for all the help