Closed danielchalef closed 6 years ago
solution doesn't seems to work. where exactly to define the BLAS-specific environment variables? Thanks!
In the shell environment in which you're invoking your python script:
MKL_NUM_THREADS=16 MKL_DYNAMIC=FALSE python my_script.py
Thanks Daniel. Is there a way of defining this in PyCharm? using os.environ command (os.environ["MKL_NUM_THREADS"] = "16") doesn't seem to work. Thanks!
Set the environment variables in the Run/Debug Configuration.
spaCy 2.0.x
pipe
uses numpy's linked BLAS library for multiprocessing and does not honorn_threads
. As a result, passingn_threads
tocorpus.add_texts
is ineffective. This can be worked around by setting an environment variable, however, the MKL (distributed with all Anaconda python installs) requires additional trickery in order to utilize all available cores.Expected Behavior
Passing the
n_threads
parameter tocorpus.add_texts
would result inn_threads
threads/processes being spawned to process texts added to the corpus.Current Behavior
The
n_threads
parameter is ignored and the thread/process count left up to the BLAS library.Possible Solution
This is a workaround: Define a BLAS-specific environment variable setting the number of threads. e.g.
OMP_NUM_THREADS=16
OPENBLAS_NUM_THREADS=16
MKL_NUM_THREADS=16
Note:
MKL_DYNAMIC=FALSE
Steps to Reproduce (for bugs)
Only 8 processes were spawned.
Context
Extremely slow parsing of a corpus of several million documents, despite running on a high CPU core machine.
Your Environment