astrojuanlu / fenics-recipes

This repository contains conda recipes for the FEniCS libraries
The Unlicense
13 stars 15 forks source link

OpenBLAS spawns threads #60

Open blechta opened 8 years ago

blechta commented 8 years ago

Observed on https://github.com/mikaem/fenics-recipes fork but there is not issue tracker. It might be useful to other forks as well. Ping @mikaem.

OpenBLAS is compiled with thread support and spawns as many threads as number of hyper-threading cores in the system. This slows down computation and eats memory. To reproduce compare

time python demo_cahn-hilliard.py
time OMP_NUM_THREADS=1 python demo_cahn-hilliard.py

Watch running processes using top and eventually add list_timings(TimingClear_keep, [TimingType_wall]) to the end of the file.

Fix is to compile OpenBLAS with make USE_THREAD=0 (preferred) or export OMP_NUM_THREADS=1 when preparing FEniCS environment. Note that export OPENBLAS_NUM_THREADS=1 does not seem to work.

Problem has already been discussed at https://fenicsproject.org/pipermail/fenics/2015-March/002619.html.

mikaem commented 8 years ago

Hi Jan, I remember the problem being discussed, but I have not really observed this on my machines. Using the non-vtk version from my channel (the one compiled from branch conda-gcc) I see no effect on my laptop of using OMP_NUM_THREADS=1. I can add export OMP_NUM_THREADS=1 in build.sh and compile a new version. Don't really want to mess with the openblas installation unless I really have to.

blechta commented 8 years ago

FYI, it's reproducible in travis-python container:

docker run --rm -ti quay.io/travisci/travis-python /bin/bash

and in the container

su - travis
export DEBIAN_FRONTEND=noninteractive
source ~/virtualenv/python2.7/bin/activate
wget https://repo.continuum.io/miniconda/Miniconda-latest-Linux-x86_64.sh -O miniconda.sh;
bash miniconda.sh -b -p $HOME/miniconda
export PATH="$HOME/miniconda/bin:$PATH"
unset PYTHONPATH
conda config --set always_yes yes
conda config --add channels mikaem/label/travis
conda config --add channels mikaem
conda install fenics=2016.2.dev pytest matplotlib
cd miniconda/share/dolfin/demo/documented/cahn-hilliard/python/
time python demo_cahn-hilliard.py
time OMP_NUM_THREADS=1 python demo_cahn-hilliard.py

I think this is more less what is run on Travis CI. Maybe conda libraries link to BLAS from the machine? I don't know how conda recipes work. Adding export OMP_NUM_THREADS=1 could be good solution.

mikaem commented 8 years ago

I just remembered that I actually did compile my own openblas. Using conda-recepies. Should be easy enough to recompile with make USE_THREAD=0. Thanks for the docker/travis tip:-) I'll install it and check for myself.

mikaem commented 8 years ago

BTW, openblas was compiled with make DYNAMIC_ARCH=1 BINARY=${ARCH} NO_LAPACK=0 NO_AFFINITY=1 NUM_THREADS=1 So is USE_THREADS=0 better than NUM_THREADS=1 ??

mikaem commented 8 years ago

Ok, I see it in the docker. I'll recompile openblas and update the fenics packages accordingly.

mikaem commented 8 years ago

I just remembered that I already have this in my .bashrc

export OPENBLAS_NUM_THREADS=1
export OMP_NUM_THREADS=1

And that is why I don't see any difference on my machine

blechta commented 8 years ago

So is USE_THREADS=0 better than NUM_THREADS=1 ??

That should not matter, see https://github.com/xianyi/OpenBLAS/blob/develop/Makefile.system#L176, although there might be bugs. Line 187 seems buggy. USE_THREAD=0 should do the job.

But note that the correct variable is USE_THREAD, not threads.

mikaem commented 8 years ago

Looks to me like USE_THREAD=0 is the same as NUM_THREADS=1? Anyway, I recompiled openblas with USE_THREAD=0 and uploaded to Anaconda Cloud. The same behaviour is still observed on the docker:-(