Open blechta opened 8 years ago
Hi Jan, I remember the problem being discussed, but I have not really observed this on my machines. Using the non-vtk version from my channel (the one compiled from branch conda-gcc) I see no effect on my laptop of using OMP_NUM_THREADS=1. I can add export OMP_NUM_THREADS=1 in build.sh and compile a new version. Don't really want to mess with the openblas installation unless I really have to.
FYI, it's reproducible in travis-python container:
docker run --rm -ti quay.io/travisci/travis-python /bin/bash
and in the container
su - travis
export DEBIAN_FRONTEND=noninteractive
source ~/virtualenv/python2.7/bin/activate
wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh;
bash miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
unset PYTHONPATH
conda config --set always_yes yes
conda config --add channels mikaem/label/travis
conda config --add channels mikaem
conda install fenics=2016.2.dev pytest matplotlib
cd miniconda/share/dolfin/demo/documented/cahn-hilliard/python/
time python demo_cahn-hilliard.py
time OMP_NUM_THREADS=1 python demo_cahn-hilliard.py
I think this is more less what is run on Travis CI. Maybe conda libraries link to BLAS from the machine? I don't know how conda recipes work. Adding export OMP_NUM_THREADS=1
could be good solution.
I just remembered that I actually did compile my own openblas. Using conda-recepies. Should be easy enough to recompile with make USE_THREAD=0. Thanks for the docker/travis tip:-) I'll install it and check for myself.
BTW, openblas was compiled with make DYNAMIC_ARCH=1 BINARY=${ARCH} NO_LAPACK=0 NO_AFFINITY=1 NUM_THREADS=1 So is USE_THREADS=0 better than NUM_THREADS=1 ??
Ok, I see it in the docker. I'll recompile openblas and update the fenics packages accordingly.
I just remembered that I already have this in my .bashrc
export OPENBLAS_NUM_THREADS=1
export OMP_NUM_THREADS=1
And that is why I don't see any difference on my machine
So is USE_THREADS=0 better than NUM_THREADS=1 ??
That should not matter, see https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L176, although there might be bugs. Line 187 seems buggy. USE_THREAD=0
should do the job.
But note that the correct variable is USE_THREAD
, not threads.
Looks to me like USE_THREAD=0 is the same as NUM_THREADS=1? Anyway, I recompiled openblas with USE_THREAD=0 and uploaded to Anaconda Cloud. The same behaviour is still observed on the docker:-(
Observed on https://github.com/mikaem/fenics-recipes fork but there is not issue tracker. It might be useful to other forks as well. Ping @mikaem.
OpenBLAS is compiled with thread support and spawns as many threads as number of hyper-threading cores in the system. This slows down computation and eats memory. To reproduce compare
Watch running processes using
top
and eventually addlist_timings(TimingClear_keep, [TimingType_wall])
to the end of the file.Fix is to compile OpenBLAS with
make USE_THREAD=0
(preferred) orexport OMP_NUM_THREADS=1
when preparing FEniCS environment. Note thatexport OPENBLAS_NUM_THREADS=1
does not seem to work.Problem has already been discussed at https://fenicsproject.org/pipermail/fenics/2015-March/002619.html.