FlorianRhiem / pyGLFW

Python bindings for GLFW
MIT License
232 stars 36 forks source link

OpenGL version check crashes for frozen executables like PyInstaller/cxFreeze #71

Closed BenBE closed 1 year ago

BenBE commented 1 year ago

When running the following script under PyInstaller, trying to import glfw causes an infinite loop while trying to determine the imported OpenGL library version:

Context:

Please rework the check in a way that is compatible with PyInstaller/cxFreeze so programs using GLFW can be packaged properly.

AFAICS, it should be possible to rewrite this code transparently using the multiprocessing module (and a fork context), which also should simplify returning the retrieved values back to the caller.

FlorianRhiem commented 1 year ago

Hey @BenBE, thank you for reporting this and the hint on how to fix it. I've implemented the fix you suggested in the branch feature-multiprocessing. Could you please give that a try to make sure it fixes it for your use case as well as my simple test?

BenBE commented 1 year ago

Thank you very much for your patch.

I did a quick test of it on both Linux (Ubuntu 22.04 LTS amd64) and Windows (Win11 x64) and the problem seems to get resolved by this.

BUT, the patch as it is now has some potential pitfalls you should be aware of, which are somewhat documented in the library's overview page:

Thank you again for your quick response and the patch for easy evaluation.

[1] I'm aware of why the code is there in the first place though.

FlorianRhiem commented 1 year ago

Thanks for the feedback!

Using a context definitely seems better than setting the multiprocessing-wide start method. That code only runs if the platform isn't windows, so fork should be fine and has the advantage of not requiring properly name == "__main__" wrapped code (which spawn does, afaik), which currently works fine.

You can use the PYGLFW_LIBRARY environment variable to set the library path. A possible alternative to using multiprocessing would be to detect being frozen and require PYGLFW_LIBRARY being set in that case. The variable could be set right before importing glfw, so it could be set using some logic, etc. Do you think that would be preferable?

BenBE commented 1 year ago

As the multiprocessing code is not triggered on Windows, we should be fine on that front. IIRC cygwin and msys may behave a bit different, so if you have a chance to test this I'd recommended running a few tests there (I won't have the time for this though).

The special handling for frozen binaries is likely not necessary, as I missed the check for the environment variable when I first looked at the code (related, but independent note: setting the variable on Windows may be skipping to load the MS VC Runtime; not sure if this is intentional).

Thus with the current implementation using multiprocessing on non-Windows platforms and the way to override using the environment variable, we should be fine with the implementation as-is, without the need for additional checks for frozen executables.

FlorianRhiem commented 1 year ago

The MSVC runtime is only loaded for bundled DLLs, for which I know which runtime to load.

I've released the changes as v2.6.0, thank you again for your help!

FlorianRhiem commented 1 year ago

Sadly have to reopen this, as the changes seem to cause issues on arm64 macOS systems. I've reverted the changes and for now replaced them with a check for frozen executables, that way frozen executables won't have infinite recursion and arm64 macs will still work. I've also added a section to the README about frozen executables.

Can you please confirm that you can work around this by specifying the GLFW library path using the environment variable?

BenBE commented 1 year ago

Okay, had a quick look at the issue and I could mostly automate finding the correct library even for frozen libraries by hacking together the following pre-loader:

import os
import sys
if getattr(sys, "frozen", False):
    import glob

    def _find_library_candidates(library_names,
                                library_file_extensions,
                                library_search_paths):
        """
        Finds and returns filenames which might be the library you are looking for.
        """
        candidates = set()
        for library_name in library_names:
            for search_path in library_search_paths:
                glob_query = os.path.join(search_path, '*'+library_name+'*')
                for filename in glob.iglob(glob_query):
                    filename = os.path.realpath(filename)
                    if filename in candidates:
                        continue
                    basename = os.path.basename(filename)
                    if basename.startswith('lib'+library_name):
                        basename_end = basename[len('lib'+library_name):]
                    elif basename.startswith(library_name):
                        basename_end = basename[len(library_name):]
                    else:
                        continue
                    for file_extension in library_file_extensions:
                        if basename_end.startswith(file_extension):
                            if basename_end[len(file_extension):][:1] in ('', '.'):
                                candidates.add(filename)
                        if basename_end.endswith(file_extension):
                            basename_middle = basename_end[:-len(file_extension)]
                            if all(c in '0123456789.' for c in basename_middle):
                                candidates.add(filename)
        return candidates

    def _load_library(library_names, library_file_extensions,
                    library_search_paths, version_check_callback):
        """
        Finds, loads and returns the most recent version of the library.
        """
        candidates = _find_library_candidates(library_names,
                                            library_file_extensions,
                                            library_search_paths)
        for filename in candidates:
            version = version_check_callback(filename)
            if version is not None and version >= (3, 0, 0):
                return filename

        return ''

    def _glfw_get_version(filename):
        """
        Queries and returns the library version tuple or None by using a
        subprocess.
        """
        return (3, 0, 0)

    def _get_library_search_paths():
        """
        Returns a list of library search paths, considering of the current working
        directory, default paths and paths from environment variables.
        """
        package_path = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'glfw')
        search_paths = [
            '',
            package_path,
            sys.prefix + '/lib',
            '/usr/lib64',
            '/usr/local/lib64',
            '/usr/lib', '/usr/local/lib',
            '/opt/homebrew/lib',
            '/run/current-system/sw/lib',
            '/usr/lib/x86_64-linux-gnu/',
            '/usr/lib/aarch64-linux-gnu/',
            '/usr/lib/arm-linux-gnueabihf',
        ]

        if sys.platform != 'darwin':
            # manylinux2014 wheels contain libraries built for X11 and Wayland
            if os.environ.get('PYGLFW_LIBRARY_VARIANT', '').lower() in ['wayland', 'x11']:
                search_paths.insert(1, os.path.join(package_path, os.environ['PYGLFW_LIBRARY_VARIANT'].lower()))
            elif os.environ.get('XDG_SESSION_TYPE') == 'wayland':
                search_paths.insert(1, os.path.join(package_path, 'wayland'))
            else:
                # X11 is the default, even if XDG_SESSION_TYPE is not set
                search_paths.insert(1, os.path.join(package_path, 'x11'))

        if sys.platform == 'darwin':
            path_environment_variable = 'DYLD_LIBRARY_PATH'
        else:
            path_environment_variable = 'LD_LIBRARY_PATH'
        if path_environment_variable in os.environ:
            search_paths.extend(os.environ[path_environment_variable].split(':'))
        return search_paths

    os.environ['PYGLFW_LIBRARY'] = _load_library(['glfw', 'glfw3'], ['.so', '.dylib'], _get_library_search_paths(), _glfw_get_version)

print("glfw=", os.environ.get('PYGLFW_LIBRARY', ''))
print("1")
import glfw
print("2")

What it does is basically re-using your code to find the binaries that PyInstaller will collect using --collect-binaries glfw and replace the loader check by a variant of your check that takes the first library that exists (which is the internal one).

Thus if you update _glfw_get_version to return a "compatible" version for any library if frozen and make _load_library load the first library it tests (when frozen), this should be fine in the context of PyInstaller at least (only tested on Linux right now, don't have MacOS).

Also: You can tell PyInstaller to collect libglfw.so by providing a hook hook-glfw.py like this:

from PyInstaller.utils.hooks import collect_data_files

datas = collect_data_files('glfw')

With the test case above I get a reasonable output when building my test case:

LC_ALL=C.UTF-8 pyinstaller -n test --noconfirm --noupx --onedir --collect-binaries glfw --clean test.py

The most annoying part currently is, that I have to copy most of the library search logic, as the file including that logic also triggers the library loading. Making the library loading lazy would be a definite plus here, as I then could avoid duplicating all the internal logic for search path handling.

P.S.: You might take a look at providing a PyInstaller hook for your library.

FlorianRhiem commented 1 year ago

_find_library_candidates uses a set, as the order in which libraries are found isn't important if the version check and sorting is still in place. With all versions set to (3, 0, 0), there's no reason for the code to always load the packaged library, as sets don't guarantee ordering. So if there's an old GLFW library found in the system, that might be used for some runs of the program.

If you only want to use the packaged library, you can probably simplify this code quite a bit, basically reducing it to a few relative paths in some ifs (at least one for wayland vs. x11), and picking the first file that exists from that list, however I can't provide a packaged library for every platform/architecture combination. For those combinations that I don't provide wheels for, you could provide your own GLFW library and just add the path to the list.

BenBE commented 1 year ago

_find_library_candidates uses a set, as the order in which libraries are found isn't important if the version check and sorting is still in place. With all versions set to (3, 0, 0), there's no reason for the code to always load the packaged library, as sets don't guarantee ordering. So if there's an old GLFW library found in the system, that might be used for some runs of the program.

The intention with my quick hack was to stick with the version that PyInstaller collects from the system, thus when the user opts to skip collecting binaries when building the PyInstaller it should just load the first one from the list of search paths as a fallback. (That's at least reasonable IMHO).

If you only want to use the packaged library, you can probably simplify this code quite a bit, basically reducing it to a few relative paths in some ifs (at least one for wayland vs. x11), and picking the first file that exists from that list, As PyInstaller seems to collect the correct binaries from the unpackaged run correctly, when told to do so, we could assume that if we're running frozen, those libraries are packaged in one of the paths as mentioned in the setup.py and bail, if these are missing.

however I can't provide a packaged library for every platform/architecture combination. For those combinations that I don't provide wheels for, you could provide your own GLFW library and just add the path to the list. PyInstaller uses the information provided by the package and as your setup.py specifies to include those library files, it will take them from the local installation. Thus when building the package on a platform from source, there's no need to do anything special, as running setup.py should handle all the prerequisites like building those libraries. The only thing a user needs to take care of is when building pyGLFW is to place the built libraries as part of the package, so that PyInstaller can pick them up.

import os
import sys
if True or getattr(sys, "frozen", False):
    import glob

    def _find_library_candidates(library_names,
                                library_file_extensions,
                                library_search_paths):
        """
        Finds and returns filenames which might be the library you are looking for.
        """
        candidates = []
        for library_name in library_names:
            for search_path in library_search_paths:
                glob_query = os.path.join(search_path, '*'+library_name+'*')
                for filename in glob.iglob(glob_query):
                    filename = os.path.realpath(filename)
                    if filename in candidates:
                        continue
                    basename = os.path.basename(filename)
                    if basename.startswith('lib'+library_name):
                        basename_end = basename[len('lib'+library_name):]
                    elif basename.startswith(library_name):
                        basename_end = basename[len(library_name):]
                    else:
                        continue
                    for file_extension in library_file_extensions:
                        if basename_end.startswith(file_extension):
                            if basename_end[len(file_extension):][:1] in ('', '.'):
                                candidates.append(filename)
                        elif basename_end.endswith(file_extension):
                            basename_middle = basename_end[:-len(file_extension)]
                            if all(c in '0123456789.' for c in basename_middle):
                                candidates.append(filename)
        return candidates

    def _load_library(library_names, library_file_extensions, library_search_paths):
        """
        Finds, loads and returns the most recent version of the library.
        """
        candidates = _find_library_candidates(library_names, library_file_extensions, library_search_paths)
        for filename in candidates:
            return filename

        return ''

    def _get_library_search_paths():
        """
        Returns a list of library search paths, considering of the current working
        directory, default paths and paths from environment variables.
        """
        package_path = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'glfw')
        search_paths = [
            '',
            package_path,
        ]

        if sys.platform != 'darwin':
            # manylinux2014 wheels contain libraries built for X11 and Wayland
            if os.environ.get('PYGLFW_LIBRARY_VARIANT', '').lower() in ['wayland', 'x11']:
                search_paths.insert(1, os.path.join(package_path, os.environ['PYGLFW_LIBRARY_VARIANT'].lower()))
            elif os.environ.get('XDG_SESSION_TYPE') == 'wayland':
                search_paths.insert(1, os.path.join(package_path, 'wayland'))
            else:
                # X11 is the default, even if XDG_SESSION_TYPE is not set
                search_paths.insert(1, os.path.join(package_path, 'x11'))

        if sys.platform == 'darwin':
            path_environment_variable = 'DYLD_LIBRARY_PATH'
        else:
            path_environment_variable = 'LD_LIBRARY_PATH'
        if path_environment_variable in os.environ:
            search_paths.extend(os.environ[path_environment_variable].split(':'))

        return search_paths

    os.environ['PYGLFW_LIBRARY'] = _load_library(['glfw', 'glfw3'], ['.so', '.dylib'], _get_library_search_paths())

print("glfw=", os.environ.get('PYGLFW_LIBRARY', ''))
print("1")
import glfw
print("2")

NB: Your implementation in _find_library_candidates duplicates the check for the extension, when the filename only contains a plain extension. In that case it's sufficient to elif the endswith check, as you either can start or end in an extension. Also changed the code above to use a list instead of a set to preserve ordering.

FlorianRhiem commented 1 year ago

_find_library_candidates uses a set, as the order in which libraries are found isn't important if the version check and sorting is still in place. With all versions set to (3, 0, 0), there's no reason for the code to always load the packaged library, as sets don't guarantee ordering. So if there's an old GLFW library found in the system, that might be used for some runs of the program.

The intention with my quick hack was to stick with the version that PyInstaller collects from the system, thus when the user opts to skip collecting binaries when building the PyInstaller it should just load the first one from the list of search paths as a fallback. (That's at least reasonable IMHO).

I can see why you'd prefer it that way, but I personally think it's better to require an explicit library path than to potentially use an old version of the GLFW library leading to crashes later on, especially in a situation where you're already using a tool to bundle dependencies for a specific system, so that you have full control over which files are included.

If you only want to use the packaged library, you can probably simplify this code quite a bit, basically reducing it to a few relative paths in some ifs (at least one for wayland vs. x11), and picking the first file that exists from that list, As PyInstaller seems to collect the correct binaries from the unpackaged run correctly, when told to do so, we could assume that if we're running frozen, those libraries are packaged in one of the paths as mentioned in the setup.py and bail, if these are missing.

Yes, and for platforms that this package doesn't provide wheels for that include the GLFW library, you could still include your own build of the GLFW library.

however I can't provide a packaged library for every platform/architecture combination. For those combinations that I don't provide wheels for, you could provide your own GLFW library and just add the path to the list. PyInstaller uses the information provided by the package and as your setup.py specifies to include those library files, it will take them from the local installation. Thus when building the package on a platform from source, there's no need to do anything special, as running setup.py should handle all the prerequisites like building those libraries. The only thing a user needs to take care of is when building pyGLFW is to place the built libraries as part of the package, so that PyInstaller can pick them up.

I'm a bit confused by this, as you seem to contradict yourself. I agree that a user could place their own library in the part and it'd become part of the package, but setup.py has no way to build the libraries themselves, and I don't plan on changing that, as that'd mean setup.py would also have to provide for a compiler, CMake, the system relevant libraries, etc, while on many systems building the library yourself would be contraproductive, because the library is managed by a package manager.

What is the intention for the code you posted? You're searching user provided library search paths, but not standard system paths? If you wish to use the packaged version, why not just provide a fixed list? If you want to fall back on system versions, why only the ones from (DY)LD_LIBRARY_PATH? In any case, code for Windows seems missing.

BenBE commented 1 year ago

_find_library_candidates uses a set, as the order in which libraries are found isn't important if the version check and sorting is still in place. With all versions set to (3, 0, 0), there's no reason for the code to always load the packaged library, as sets don't guarantee ordering. So if there's an old GLFW library found in the system, that might be used for some runs of the program.

The intention with my quick hack was to stick with the version that PyInstaller collects from the system, thus when the user opts to skip collecting binaries when building the PyInstaller it should just load the first one from the list of search paths as a fallback. (That's at least reasonable IMHO).

I can see why you'd prefer it that way, but I personally think it's better to require an explicit library path than to potentially use an old version of the GLFW library leading to crashes later on, especially in a situation where you're already using a tool to bundle dependencies for a specific system, so that you have full control over which files are included.

What do you think of checking locations for frozen executables in the following order:

  1. Whatever PYGLFW_LIBRARY is, if non-empty
  2. libglfw.so / libglfw.dylib / glfw.dll in same directory as the executable (to allow the user an easy override (possibly with x11/wayland distinction)
  3. x11/wayland specific libs libglfw.so from within the bundle based on session type
  4. libglfw.so / libglfw.dylib / glfw.dll from within the bundle
  5. fail to load

however I can't provide a packaged library for every platform/architecture combination. For those combinations that I don't provide wheels for, you could provide your own GLFW library and just add the path to the list. PyInstaller uses the information provided by the package and as your setup.py specifies to include those library files, it will take them from the local installation. Thus when building the package on a platform from source, there's no need to do anything special, as running setup.py should handle all the prerequisites like building those libraries. The only thing a user needs to take care of is when building pyGLFW is to place the built libraries as part of the package, so that PyInstaller can pick them up.

I'm a bit confused by this, as you seem to contradict yourself. I agree that a user could place their own library in the part and it'd become part of the package, but setup.py has no way to build the libraries themselves, and I don't plan on changing that, as that'd mean setup.py would also have to provide for a compiler, CMake, the system relevant libraries, etc, while on many systems building the library yourself would be contraproductive, because the library is managed by a package manager.

Ah, sorry. Confusion on my side.

What is the intention for the code you posted? You're searching user provided library search paths, but not standard system paths? If you wish to use the packaged version, why not just provide a fixed list? If you want to fall back on system versions, why only the ones from (DY)LD_LIBRARY_PATH? In any case, code for Windows seems missing.

That code was based on your implementation in pyGLFW. I tried to keep the implementation changes minimal, thus did not remove that part. But yes, if going with the list of locations as outlined above, this part should be dropped.

FlorianRhiem commented 1 year ago

Ah, now I think we're on the same page. Yes, that order seems reasonable and shouldn't risk pulling in an old GLFW2 library from somewhere in the system. It should be possible to implement this list (points 2. to 4.) fairly concisely and without a version check; if a file from the list exists and can be loaded, it is used, in order from top to bottom. The current working directory might be another possible directory for a user override (you already have it in your example code too), in addition to the executable directory, if those two aren't the same.

Would you like to create a pull request for this or should I implement it?

BenBE commented 1 year ago

Would you like to create a pull request for this or should I implement it?

Please go ahead.

FlorianRhiem commented 1 year ago

Could you please review the implementation in branch feature-frozen-library?

BenBE commented 1 year ago

While the implementation looks sane AFAICT, I somehow get a failure trying to run that version under PyInstaller. Currently debugging the cause.

Edit: Issue found. When I built the module locally I forgot to place the GLFW libraries at the correct location for PyInstaller to pick them up. Once they were placed as part of the installed package, loading succeeded without any trickery needed.

BenBE commented 1 year ago

With the build now tested locally, I'd like to share some remarks for the implementation:

Let's start with a small note regarding load order: Please put the current directory last, i.e. [executable_path, package_path_variant, package_path, current_path]. Especially with frozen executables the library should be as close to the executable as possible, thus having it found in the current directory over the directory of the executable potentially opens security concerns regarding library injection (the path of the executable potentially is write-restricted, which is usually not the case for the cwd).

Furthermore I've got some remarks on _find_library_candidates as you touched the code of that function as part of that commit. What about implementing it like this?

def _find_library_candidates(library_names,
                             library_file_extensions,
                             library_search_paths):
    """
    Finds and returns filenames which might be the library you are looking for.
    """
    candidates = []
    for search_path in library_search_paths:
        if not search_path:
            continue
        for library_name in library_names:
            glob_query = os.path.join(search_path, '*'+library_name+'.*')
            for filename in glob.iglob(glob_query):
                filename = os.path.realpath(filename)
                if filename in candidates:
                    continue
                basename = os.path.basename(filename)
                if basename.startswith('lib'+library_name):
                    basename_end = basename[len('lib'+library_name):]
                elif basename.startswith(library_name):
                    basename_end = basename[len(library_name):]
                else:
                    continue
                for file_extension in library_file_extensions:
                    if basename_end.startswith(file_extension):
                        if basename_end[len(file_extension):][:1] in ('', '.'):
                            candidates.append(filename)
                    elif basename_end.endswith(file_extension):
                        basename_middle = basename_end[:-len(file_extension)]
                        if all(c in '0123456789.' for c in basename_middle):
                            candidates.append(filename)
    return candidates

The first change is related to allowing None values in the search paths. By negating the expression you can avoid one level of indentation on most of the function. The second change is regarding the version number check in the extension part of the basename: The second if should be an elif to avoid duplicating the candidate if the filename does not contain any ABI version spec (i.e. plain extension). And finally another small nit: The basename should probably be checked for an . directly after the literal given to ensure filename lookup is restricted in a way as is likely expected by users (note the additional . in the glob pattern which should do the trick). As the code is now (also in the existing release) you can have libglfw42.so found by glfw, which is IMHO unexpected.

FlorianRhiem commented 1 year ago

I've updated the branch with your suggested changes. Please let me know whether there's anything I missed or should improve, otherwise I'll create a new release from that branch.

Thank you again for the help!

BenBE commented 1 year ago

LGTM.

FlorianRhiem commented 1 year ago

Thank you, I've released the changes as 2.6.2.