MacPython / openblas-libs

BSD 2-Clause "Simplified" License
9 stars 13 forks source link

Shrink the size of OpenBLAS DLLs on Windows #175

Open carlkl opened 3 months ago

carlkl commented 3 months ago

Shrinking the size of OpenBLAS binary size can be done with several ways:

  1. Use the same OpenBLAS DLL for numpy as well for scipy
  2. use of DYNAMIC_LIST to reduce the number of targets
  3. strip DLL

(1) has the greatest impact on the overall size of a python installation as well as on the memory consumption of a python process. There is no good reason to keep two dedicated OpenBLAS binaries in a process.
There are two ways to accomplish this: a) scipy could use the OpenBLAS DLL from numpy or b) numpy as well as scipy both depend on a dedicated OpenBLAS wheel with a OpenBLAS.dll included. (b) has the advantage to allow for easy monkey-patching the OpenBLAS DLL, i.e. with less more threads enabled if needed.

(2) in a similar vein as https://github.com/MacPython/openblas-libs/pull/166

(3) included in https://github.com/MacPython/openblas-libs/pull/85 with the help of -Wl,-gc-sections -Wl,-s in the linking stage.

rgommers commented 3 months ago

Re (1), the relevant issue is https://github.com/scipy/scipy/issues/15129. This isn't happening soon, it's blocked for at least two reasons (ILP64 vs. LP64, and Python packaging standards forbidding us from having an extra dependency in some wheels only).

(2) is being done.

(3) is always a good idea - if stripping isn't optimal yet, that's great to fix.

mattip commented 3 months ago

The scipy-openblas-0.3.27.44.3 wheels, without #85, are here. The win_amd64 one is 10.7MB.

The scipy-openblas-0.3.27.44.4 wheels, with #85, are here. The win_amd64 one is 10.0 MB. Adding #85 and backing out a windows threading issue saved ~0.7MB.

It seems #177 will shrink the wheel to 6.7MB. :tada:

rgommers commented 3 months ago

That is excellent, thanks Matti. Also quite useful to have two tagged versions with the only change being the size change due to the dropped architectures - that's going to help in case we get some issue that may possibly be related.

mattip commented 3 months ago

Yes, although it might be difficult to untangle windows performance bug reports. 0.3.27.44.4 reverts windows threading improvements which obviously impacts performance, and then 0.3.27.44.5 removes some kernels. We can use linux as a control platform, since it will only have the kernel removals.

carlkl commented 3 months ago

Adding the flag -fno-ident will take out some noise out of the binary as well.

mattip commented 3 months ago

According to the documentation

-fno-ident
    Ignore the #ident directive.

Is there much use of that directive in gcc and/or OpenBLAS?

carlkl commented 3 months ago

It puts a string constant into the binary - for each individual function! You can identify the gcc version used for the build process.

If you are not sure about the usage from elsewhere one could compile one function with -fident, which is enough for the OpenBLAS binary.

mattip commented 3 months ago

Looking at the artifacts in the CI runs from #178 before- and after-adding -fno-ident it does not seem to change the size of the shared object.

carlkl commented 3 months ago

Hm, I forgot, that -Wl,-gc-sections removes all ident strings, so -fno-ident has no effect anymore.