Closed answerquest closed 6 years ago
First we don’t currently build numpy
against mkl
. Only defaults
does that currently. Though they have a nomkl
package that can be installed to opt-out. We build against openblas
, which is BSD 3-Clause. That said, it’s possible in the future that we ship both options, OpenBLAS and MKL, letting users choose one much like defaults
. MKL is actually Open License (not Open Source), which means we can link to it and share it freely should we wish to.
MKL is actually Open License (not Open Source), which means we can link to it and share it freely should we wish to.
That's not quite complete. There is a potential issue here, especially when using PyInstaller or a similar such tool: it's possible that it's a GPL violation to distribute an executable with both MKL and a GPL component. The NumPy team has talked to Intel about this (answer, Intel will not give definitive legal advice) and gotten good independent advice (answer, GPL violation potentially possible here but the likelihood of that is case-specific).
To add to the answer to @answerquest: MKL or another BLAS package is definitely necessary for numpy. You're getting MKL because you have installed the Anaconda default numpy. If you use conda install -c conda-forge numpy
you will get this package, and will then get OpenBLAS instead of MKL.
Thanks for the clarification. Anyways as you can see in the support links posted, with the programs working perfectly fine without mkl
installed, I'm going ahead with not using conda for installing the numpy and pandas packages for the time being and that will be my recommendation in the support forums when questions about the too large size pop up again. [Edit] it'll be better to use conda install -c conda-forge numpy
to install numpy : it replaces mkl
with OpenBLAS
What could help in this matter is if we could have a list of numpy/pandas commands that actually do need mkl, then people can have an objective way of determining whether their programs need it or not. The difference is a whopping 600MB in program size, so that is significant for any program creator (my program's binary is just 30MB when I go the no-conda way, and none of the functions are failing. It doesn't make any sense for me to include mkl just out of a sense of formality/loyalty) and is well worth the disambiguation.
Also, in a conda install, if there can be a way to manually specify which dependency is to be excluded, then that can also be a good workaround, as the other benefits of conda over pip are still there and I still want to use conda.
@answerquest that's not the best recommendation unfortunately. It works in that case, but installing numpy
with pip
inside a conda env
is not a good idea. numpy
is special-cased by conda
, so it's about the only thing that you really shouldn't install with pip
. Two better alternatives:
conda install -c conda-forge numpy
(will give you the same OpenBLAS dependency as the official numpy wheel has that pip
grabs)conda
, but create a clean virtualenv
and install with pip
into that.@rgommers my bad, sorry, I had not read the OpenBLAS line correctly. If -c conda-forge
helps to exclude mkl then that's a good solution indeed. I'm guessing OpenBLAS is not 600MB in size?
Definitely using virtual environment to create the binary.
Indeed, should be <10 MB.
For anyone trying to do as @rgommers suggests (option 1. - it worked in the end!). The following might save you 1 hour of puzzling: stackoverflow thread.
I was having difficulty installing pyinstaller AND numpy with openblas just now because my "conda install -c conda-forge pyinstaller" command resulted in numpy being "upgraded" to an mkl-linked one. The link explained a great deal and pyinstaller now makes my .py "import numpy" into an exe (on windows) of <14mb :)
Still, scary to be so dependent on what version is available/downloaded via conda. Would be a shame not to be able to make small executables which make use of numpy. Should I be worried?
FWIW what I typically do is conda install conda-forge::blas=*=openblas
. This ensures you will get OpenBLAS backed NumPy and friends.
Yes, or add
blas=*=openblas
To your condarc, https://conda.io/docs/user-guide/configuration/use-condarc.html#always-add-packages-by-default-create-default-packages
@msarahan How do I add this? I tried adding it at the bottom of the .condarc in my environment, but then I get the following error
LoadError: Load Error: in C:\Users\filip\Anaconda3\envs\sci\.condarc on line 3, column 15. Invalid YAML
Hi, just FYI (not replying to any earlier post here), I've since had no problems in using just pip
to install numpy and pandas modules for my application. If they're leaving anything out, then my prog isn't using it anyways and I haven't experienced any problems off it. The pyinstaller-generated .exe (single-file) is only around 30mb that too without upx compression.
(Note: don't use upx compression if making single-file exe using pyinstaller, as upx screws up one of the dll's)
Earlier pip was having a problem with pandas, which was why I was using conda, but that got resolved just some days after I had posted here. This update isn't relevant for this repo but seeing that there's activity here and I was the OP, I have an obligation to disclose how I finally solved the problem on my end. I went with pip and it worked out fine.
No hard feelings for conda folks, hope you don't mind this update.
Hi, just FYI (not replying to any earlier post here), I've since had no problems in using just pip to install numpy and pandas modules for my application. If they're leaving anything out, then my prog isn't using it anyways and I haven't experienced any problems off it. The pyinstaller-generated .exe (single-file) is only around 30mb that too without upx compression. (Note: don't use upx compression if making single-file exe using pyinstaller, as upx screws up one of the dll's)
This issue is fixed in newer versions, so installing numpy from conda-forge should get you a openblas-version.
But there is no openblas/nomkl version of scipy on Windows yet, so I'm using pip to install scipy. I have the same experience as you, no issues, but something is probably not getting installed correctly. But I prefer not mixing pip and conda, so I'd love a conda-forge
version of scipy. Work on that is going on here: https://github.com/conda-forge/scipy-feedstock/pull/78
This is slow - 1/3 of my pandas load time is in one slow call. ncalls tottime percall cumtime percall filename:lineno(function) 1 2.295 2.295 2.295 2.295 {built-in method mkl._py_mkl_service.get_version}
commit : f2ca0a2665b2d169c97de87b8e778dbed86aea07 python : 3.7.4.final.0 python-bits : 64 OS : Windows OS-release : 10 Version : 10.0.18362 machine : AMD64 processor : Intel64 Family 6 Model 142 Stepping 10, GenuineIntel
pandas : 1.1.1 numpy : 1.19.1
mkl isn't even listed in show versions.
Ref:
mkl
package is co-installed when we install either pandas or numpy using conda. It is a very large package clocking at ~200MB for download, and is ~600MB when installed in thepkgs
folder of my MiniConda installation. Thepip
installer does not include this package when installing pandas. It is not there among conda feedstocks list and it has no description given on https://pypi.org/project/mkl/ . And..I do not know more about this subject, but when I searched for
mkl
I came across more results formkl-fft
andmkl-random
which is are not the same asmkl
, and are under free licenses.mkl-fft
's description on pypi also seems more numpy-involved. https://pypi.org/project/mkl-fft/My hunch is that
mkl-fft
andmkl-random
were the ones supposed to be included in thenumpy
installs andmkl
got included by accident.Where this is really causing a problem : when generating self-contained binaries for distribution, the
mkl
packages gets roped in for programs that import eithernumpy
orpandas
if conda has installed it in the python environment. For windows binary that thePyInstaller
program creates, it balloons up the dist by about 600MBs.Please investigate this and if it's not essential to numpy then remove
mkl
from the numpy installation by conda.Info: Conda version: 4.5.1, on Windows 7 64-bit. As part of MiniConda Python3 64-bit.
Sharing lines from the numpy json file I found in my MiniConda installation's
conda-meta
folder:Sharing lines from
[Miniconda3]\pkgs\mkl-2018.0.2-1\info\LICENSE.txt
: