Open carlkl opened 3 months ago
Re (1), the relevant issue is https://github.com/scipy/scipy/issues/15129. This isn't happening soon, it's blocked for at least two reasons (ILP64 vs. LP64, and Python packaging standards forbidding us from having an extra dependency in some wheels only).
(2) is being done.
(3) is always a good idea - if stripping isn't optimal yet, that's great to fix.
That is excellent, thanks Matti. Also quite useful to have two tagged versions with the only change being the size change due to the dropped architectures - that's going to help in case we get some issue that may possibly be related.
Yes, although it might be difficult to untangle windows performance bug reports. 0.3.27.44.4 reverts windows threading improvements which obviously impacts performance, and then 0.3.27.44.5 removes some kernels. We can use linux as a control platform, since it will only have the kernel removals.
Adding the flag -fno-ident
will take out some noise out of the binary as well.
According to the documentation
-fno-ident
Ignore the #ident directive.
Is there much use of that directive in gcc and/or OpenBLAS?
It puts a string constant into the binary - for each individual function! You can identify the gcc version used for the build process.
If you are not sure about the usage from elsewhere one could compile one function with -fident
, which is enough for the OpenBLAS binary.
Looking at the artifacts in the CI runs from #178 before- and after-adding -fno-ident
it does not seem to change the size of the shared object.
Hm, I forgot, that -Wl,-gc-sections
removes all ident strings, so -fno-ident
has no effect anymore.
Shrinking the size of OpenBLAS binary size can be done with several ways:
(1) has the greatest impact on the overall size of a python installation as well as on the memory consumption of a python process. There is no good reason to keep two dedicated OpenBLAS binaries in a process.
There are two ways to accomplish this: a) scipy could use the OpenBLAS DLL from numpy or b) numpy as well as scipy both depend on a dedicated OpenBLAS wheel with a OpenBLAS.dll included. (b) has the advantage to allow for easy monkey-patching the OpenBLAS DLL, i.e. with less more threads enabled if needed.
(2) in a similar vein as https://github.com/MacPython/openblas-libs/pull/166
(3) included in https://github.com/MacPython/openblas-libs/pull/85 with the help of
-Wl,-gc-sections -Wl,-s
in the linking stage.