bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
993 stars 354 forks source link

OptiType: numpy array IndexError #1863

Closed ghost closed 7 years ago

ghost commented 7 years ago

Hi Brad,

I am working through the tumor/normal synthetic 3 Teaching example for bcbio-nextgen, but I am receiving an error from the OptiType HLA typing:

subprocess.CalledProcessError:` Command 'set -o pipefail; OptiTypePipeline.py -v --dna  -o /home/ubuntu/run/cancer-syn3-chr6/work/align/syn3-tumor/hla/bcbiotx/tmpHD6TJ3 -i /home/ubuntu/run/cancer-syn3-chr6/work/align/syn3-tumor/hla/OptiType-HLA-A_B_C-input.fq -c /home/ubuntu/run/cancer-syn3-chr6/work/align/syn3-tumor/hla/bcbiotx/tmpHD6TJ3/config.ini

0:00:00.70 Mapping OptiType-HLA-A_B_C-input.fq to GEN reference...

0:08:20.69 Generating binary hit matrix.
0:08:20.70 Loading /home/ubuntu/run/cancer-syn3-chr6/work/align/syn3-tumor/hla/bcbiotx/tmpHD6TJ3/2017_03_15_19_34_26/2017_03_15_19_34_26_1.bam started. Number of HLA reads loaded (updated every thousand):
1K...2K...
0:08:24.01 2500 reads loaded. Creating dataframe...
0:08:24.12 Dataframes created. Shape: 2500 x 11179, hits: 2396062 (2642872), sparsity: 1 in 10.57

0:08:27.19 temporary pruning of identical rows and columns

0:08:27.41 Size of mtx with unique rows and columns: (702, 1510)
0:08:27.41 determining minimal set of non-overshadowed alleles

0:08:32.32 Keeping only the minimal number of required alleles (220,)

0:08:32.32 Creating compact model...

0:08:32.90 Initializing OptiType model...
GLPSOL: GLPK LP/MIP Solver, v4.57
Parameter(s) specified in the command line:
 --write /tmp/tmpm0Nvss.glpk.raw --wglp /tmp/tmpMoKBnP.glpk.glp --cpxlp /tmp/tmpS3s2Lv.pyomo.lp
Reading problem data from '/tmp/tmpS3s2Lv.pyomo.lp'...
/tmp/tmpS3s2Lv.pyomo.lp:18356: warning: lower bound of variable 'x1391' redefined
/tmp/tmpS3s2Lv.pyomo.lp:18356: warning: upper bound of variable 'x1391' redefined
1390 rows, 912 columns, 12405 non-zeros
565 integer variables, all of which are binary
18921 lines were read
Writing problem data to '/tmp/tmpMoKBnP.glpk.glp'...
17308 lines were written
GLPK Integer Optimizer, v4.57
1390 rows, 912 columns, 12405 non-zeros
565 integer variables, all of which are binary
Preprocessing...
1 hidden packing inequaliti(es) were detected
332 hidden covering inequaliti(es) were detected
1388 rows, 911 columns, 12402 non-zeros
565 integer variables, all of which are binary
Scaling...
 A: min|aij| =  1.000e+00  max|aij| =  4.000e+00  ratio =  4.000e+00
Problem data seem to be well scaled
Constructing initial basis...
Size of triangular part is 1388
Solving LP relaxation...
GLPK Simplex Optimizer, v4.57
1388 rows, 911 columns, 12402 non-zeros
      0: obj =  -0.000000000e+00 inf =   4.000e+00 (4)
      4: obj =  -3.000000000e-02 inf =   0.000e+00 (0)
*   500: obj =   2.049528000e+03 inf =   4.938e-15 (207) 1
*   788: obj =   2.312821000e+03 inf =   6.328e-15 (0)
OPTIMAL LP SOLUTION FOUND
Integer optimization begins...
+   788: mip =     not found yet <=              +inf        (1; 0)
+   788: >>>>>   2.312821000e+03 <=   2.312821000e+03   0.0% (1; 0)
+   788: mip =   2.312821000e+03 <=     tree is empty   0.0% (0; 1)
INTEGER OPTIMAL SOLUTION FOUND
Time used:   0.1 secs
Memory used: 3.0 Mb (3159417 bytes)
Writing MIP solution to '/tmp/tmpm0Nvss.glpk.raw'...
2304 lines were written
/usr/local/bin/OptiTypePipeline.py:394: FutureWarning: irow(i) is deprecated. Please use .iloc[i]
  hlatype = result.irow(0)[["A1", "A2", "B1", "B2", "C1", "C2"]].drop_duplicates().dropna()

0:08:33.76 Result dataframe has been constructed...
Traceback (most recent call last):
  File "/usr/local/bin/OptiTypePipeline.py", line 398, in <module>
    coverage_mat = ht.calculate_coverage(plot_variables, features, hlatype, features_used)
  File "/usr/local/share/bcbio/anaconda/share/optitype-2015.10.20-1/hlatyper.py", line 617, in calculate_coverage
    coverage[bool(i_mismatches)][i_pairing-1][i_hitcount-1][i_pos-1:i_pos-1+i_read_length] += 1
IndexError: in the future, 0-d boolean arrays will be interpreted as a valid boolean index
' returned non-zero exit status 1

This is/was a known issue with Optitype as python3 numpy arrays won’t take Boolean index anymore. It has been addressed in the Optitype v1.2.1 release. Specifically, here:

coverage[int(bool(i_mismatches))][i_pairing-1][i_hitcount-1][i_pos-1:i_pos-1+i_read_length] +=

I manually edited line 617 of /usr/local/share/bcbio/anaconda/share/optitype-2015.10.20-1/hlatyper.py to add int(bool(i_mismatches)) and continued running through the tutorial without issue. What is the best way to direct bcbio_nextgen to replace optitype-2015.10.20-1 in bioconda with the most recent version?

Details of EC2 instance:

# launch a Ubuntu Server 14.04 (ami-d05e75b8). Start an m4.4xlarge instance with a 100Gb SSD.

# BCBIO install
sudo apt-get update
sudo apt-get install -y build-essential zlib1g-dev wget curl python-setuptools git \
                        openjdk-7-jdk openjdk-7-jre ruby libncurses5-dev libcurl4-openssl-dev \
                        libbz2-dev unzip pigz bsdmainutils

wget https://raw.githubusercontent.com/chapmanb/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py

sudo python bcbio_nextgen_install.py /usr/local/share/bcbio --tooldir /usr/local --genomes hg38 --aligners bwa --isolate -u development

mkdir -p run
cd run
wget https://raw.githubusercontent.com/chapmanb/bcbio-nextgen/master/config/teaching/cancer-syn3-chr6-prep.sh
bash cancer-syn3-chr6-prep.sh

cd cancer-syn3-chr6/work
bcbio_nextgen.py ../config/cancer-syn3-chr6.yaml -n 16
chapmanb commented 7 years ago

David; Sorry about the issue, and thank you for reporting it. This error was due to an incompatibility with recent versions of numpy and the optitype we were shipping with bcbio. I updated the optitype to a patch release that fixes the problem so if you do:

bcbio_conda install -c bioconda optitype

it should install 1.2.1. Then restarting your analysis in place should work and finish cleanly. Please let us know if you run into any other problems.

ghost commented 7 years ago

I am receiving the below message:

$ bcbio_conda install -c bioconda optitype
Fetching package metadata ...........
Solving package specifications: .

# All requested packages already installed.
# packages in environment at /usr/local/share/bcbio/anaconda:
#
optitype                  2015.10.20               py27_1    bioconda

This is in a fresh ec2 instance:

# launch a Ubuntu Server 14.04 (ami-d05e75b8). 
# at the least start an m4.4xlarge instance with a 100Gb SSD.

# install all development environment tools
sudo apt-get update
sudo apt-get install -y build-essential zlib1g-dev wget curl python-setuptools git \
                        openjdk-7-jdk openjdk-7-jre ruby libncurses5-dev libcurl4-openssl-dev \
                        libbz2-dev unzip pigz bsdmainutils python-minimal python-pip python-dev 
sudo pip install synapseclient 
sudo pip install awscli --upgrade --user

# install bcbio_nextgen
wget https://raw.githubusercontent.com/chapmanb/bcbio-nextgen/master/scripts/bcbio_nextgen_install.py
sudo python bcbio_nextgen_install.py /usr/local/share/bcbio --tooldir /usr/local --genomes hg38 --aligners bwa --isolate -u stable

# update Optitype to version 1.2.1
bcbio_conda install -c bioconda optitype
chapmanb commented 7 years ago

David; Apologies about the issues with the old 2015.10.20 version blocking the updates. I've expired that older version so if you re-run the update it should hopefully grab 1.2.1 and work now. Thank you for the patience and fingers crossed this will get things working for you.