adang1345 / delvewheel

Self-contained Python wheels for Windows
MIT License
116 stars 12 forks source link

Option to patch all dlls in wheel ? #47

Closed ChristosT closed 2 months ago

ChristosT commented 4 months ago

Hi ! I noticed that delvewheel will inspect only *.pyd files for dependencies while auditwheel goes through all the *.so files in the wheel.

Looking at the delvewheel code it looks like a matter of few lines to extend the search to include also dll files and I can create a proper MR that also introduces a cli switch if this makes it easier for you.

diff --git a/delvewheel/_wheel_repair.py b/delvewheel/_wheel_repair.py
index 58e4410..cdb625e 100644
--- a/delvewheel/_wheel_repair.py
+++ b/delvewheel/_wheel_repair.py
@@ -502,7 +502,7 @@ class WheelRepair:
                     item_lower = item.lower()
                     if item_lower.endswith('.py') and item_lower != '__init__.py':
                         self._patch_py_file(item_path, libs_dir, load_order_filename, depth)
-                    elif item_lower.endswith('.pyd'):
+                    elif item_lower.endswith('.pyd') or item_lower.endswith('.dll'):
                         namespace_root_ext_modules.add(item_path)
                 elif os.path.isdir(item_path) and \
                         (item not in self._root_level_module_names(package_dir) or self._get_init(item_path)):
@@ -626,7 +626,7 @@ class WheelRepair:
             if root == self._data_dir:
                 dirnames[:] = set(dirnames) & {'platlib', 'purelib'}
             for filename in filenames:
-                if filename.lower().endswith('.pyd'):
+                if filename.lower().endswith('.pyd') or filename.lower().endswith('.dll'):
                     extension_module_path = os.path.join(root, filename)
                     extension_module_paths.append(extension_module_path)
                     discovered, _, ignored, not_found = _dll_utils.get_all_needed(extension_module_path, self._no_dlls, self._wheel_dirs, 'ignore', False, False, self._verbose)
@@ -731,7 +731,7 @@ class WheelRepair:
             if root == self._data_dir:
                 dirnames[:] = set(dirnames) & {'platlib', 'purelib'}
             for filename in filenames:
-                if filename.lower().endswith('.pyd'):
+                if filename.lower().endswith('.pyd') or filename.lower().endswith('.dll'):
                     extension_module_path = os.path.join(root, filename)
                     dll_arch = _dll_utils.get_arch(extension_module_path)
                     if dll_arch != self._arch:

Do you see any issues enabling this feature ?

Usecase: We want to bundle C++ libraries that are loaded upon runtime via python. This results in dlls in the wheel that are not linked by any pyd file and thus they are not discovered by the method that delvewheel currently employs.

Using --add-dll does not really help here because we need to add a big number of dependencies manually and we loose the mangling that delvewheel offers which we require since we are going to bundle 3rd-party libraries.

Thank you !

adang1345 commented 4 months ago

Support for DLLs that are loaded at runtime is currently very limited. The difficulty is that each project can have its own requirements for how and when DLLs are loaded at runtime, so there is no one "right way" for delvewheel to handle them in a way that works with all projects. Things that are unknown would include

Also, if a wheel already has a DLL file, then there is no general way to know what to do with it. Should delvewheel search for its dependencies? What if some of those dependencies are already in the wheel? Which dependencies should be name-mangled?

I feel that to better support DLLs that are loaded at runtime, we would need some way for the user to specify the answers to these questions via the command line. Or at minimum, I would need to come up with reasonable defaults for all of these that would work for the most common use cases.

Let me think about your use case and see whether there's a good way to support it.

kpenev commented 3 months ago

We are running into the same problem. It looks to me at least that the answer to the above questions is so not so complicated. A DLL that is not required by a .pyd file, yet is included in the wheel, would be safe assume will be loaded at runtime. So how is that different from a .pyd file? It should have by default at least have its dependencies included and name mangled by default. If anything, there is a mechanism to disable name mangling, but there is no mechanism to enable it, so if the default does not work for some application it can be fixed. Alternatively, it could be handled by adding an option to explicitly list .dll files that should be processed with the same scheme used for .pyd. Right now we work around this limitation by changing the extension .dll -> .pyd before running delvewheel and then renaming it back. That is bound to break in future delvewheel releases, but works for now.