broadinstitute / gatk

Official code repository for GATK versions 4 and up
https://software.broadinstitute.org/gatk
Other
1.68k stars 587 forks source link

DetermineGermlineContigPloidy module using gcnvkernel #8387

Open hgingras opened 1 year ago

hgingras commented 1 year ago

Hello,

I am trying to set up a python environment to use gatk DetermineGermlineContigPloidy module. I cannot use conda. I have tried to install in a virtual python environment the dependencies found in these two files:

gatk/scripts/gatkcondaenv.yml.template gatk/src/main/python/org/broadinstitute/hellbender/setup_gcnvkernel.py

I have installed gcnvkernel in my virtual environment.


This is the error message I get when I try to import gcnvkernel: python -c "import gcnvkernel"

Traceback (most recent call last): File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/python/3.6.10/lib/python3.6/configparser.py", line 1138, in _unify_values sectiondict = self._sections[section] KeyError: 'blas'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/configparser.py", line 168, in fetch_val_for_key return theano_cfg.get(section, option) File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/python/3.6.10/lib/python3.6/configparser.py", line 781, in get d = self._unify_values(section, vars) File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx512/Core/python/3.6.10/lib/python3.6/configparser.py", line 1141, in _unify_values raise NoSectionError(section) configparser.NoSectionError: No section: 'blas'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/configparser.py", line 328, in get delete_key=delete_key) File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/configparser.py", line 172, in fetch_val_for_key raise KeyError(key) KeyError: 'blas.ldflags'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/configdefaults.py", line 1256, in check_mkl_openmp import mkl ModuleNotFoundError: No module named 'mkl'

During handling of the above exception, another exception occurred:

Traceback (most recent call last): File "", line 1, in File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/gcnvkernel/init.py", line 1, in from pymc3 import version as pymc3_version File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/pymc3/init.py", line 5, in from .distributions import * File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/pymc3/distributions/init.py", line 1, in from . import timeseries File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/pymc3/distributions/timeseries.py", line 1, in import theano.tensor as tt File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/init.py", line 124, in from theano.scan_module import (scan, map, reduce, foldl, foldr, clone, File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/scan_module/init.py", line 41, in from theano.scan_module import scan_opt File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/scan_module/scan_opt.py", line 60, in from theano import tensor, scalar File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/tensor/init.py", line 17, in from theano.tensor import blas File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/tensor/blas.py", line 155, in from theano.tensor.blas_headers import blas_header_text File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/tensor/blas_headers.py", line 987, in if not config.blas.ldflags: File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/configparser.py", line 332, in get val_str = self.default() File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/configdefaults.py", line 1451, in default_blas_ldflags check_mkl_openmp() File "/lustre04/scratch/helene/Ticket/0196857/ENV_python_3.6.10/lib/python3.6/site-packages/theano/configdefaults.py", line 1273, in check_mkl_openmp """) RuntimeError: Could not import 'mkl'. If you are using conda, update the numpy packages to the latest build otherwise, set MKL_THREADING_LAYER=GNU in your environment for MKL 2018.

If you have MKL 2017 install and are not in a conda environment you can set the Theano flag blas.check_openmp to False. Be warned that if you set this flag and don't set the appropriate environment or make sure you have the right version you will get wrong results.


Here is the pip list from my environment:

cached-property 1.5.2+computecanada
cycler 0.11.0+computecanada
enum34 1.1.10+computecanada
gatkpythonpackages 0.1
gcnvkernel 0.8
h5py 3.1.0+computecanada
intel-openmp 2021.1.1+computecanada joblib 0.14.1+computecanada
kiwisolver 1.3.1+computecanada
matplotlib 3.3.4+computecanada
mkl 2021.1.1+computecanada numpy 1.17.3+computecanada
pandas 1.0.3+computecanada
patsy 0.5.3+computecanada
Pillow 8.1.2+computecanada
pip 20.0.2
pymc3 3.1
pyparsing 3.1.0
python-dateutil 2.8.2+computecanada
pytz 2023.3+computecanada
scipy 1.1.0+computecanada
setuptools 46.1.3
six 1.16.0+computecanada
tbb 2021.1.1+computecanada Theano 1.0.4
tqdm 4.19.5+computecanada
wheel 0.34.2

I used python 3.6.10 as suggested in gatkcondaenv.yml.template and respecting these dependencies found here setup_gcnvkernel.py:

"theano == 1.0.4", "pymc3 == 3.1", "numpy >= 1.13.1", "scipy >= 0.19.1", "tqdm >= 4.15.0"


mkl is installed in my environment. When I do : python -c "import numpy ; numpy.show_config()"

I get this message:

blas_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl', '/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/include', '/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib'] blas_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl', '/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/include', '/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib'] lapack_mkl_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl', '/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/include', '/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib'] lapack_opt_info: libraries = ['mkl_rt', 'pthread'] library_dirs = ['/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib/intel64'] define_macros = [('SCIPY_MKL_H', None), ('HAVE_CBLAS', None)] include_dirs = ['/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl', '/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/include', '/cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/imkl/2020.1.217/mkl/lib']


Is it possible to have an up-to-date dependency list for me to have a functional gcnvkernel module?

Thanks for your help,

Hélène

hgingras commented 1 year ago

For those who might be interested, I finally found a good set up to run gcnvkernel outside a conda environment.

Modules loaded: GATK/4.2.4.0 and python/3.8.2

In my python virtualenv:

pip install --no-index --upgrade pip pip install --no-index --ignore-installed numpy==1.21.0 pip install --no-index scipy==1.2.0 pip install pymc3==3.1 pip install Theano==1.0.4 pip install --no-index tqdm==4.19.5 pip install --no-index PyVCF==0.6.8

git clone https://github.com/broadinstitute/gatk.git cd gatk/src/main/python/org/broadinstitute/hellbender python setup_gcnvkernel.py install python setup.py install