bcbio / bcbio-nextgen

Validated, scalable, community developed variant calling, RNA-seq and small RNA analysis
https://bcbio-nextgen.readthedocs.io
MIT License
986 stars 354 forks source link

Peddy (NotImplementedError: iLocation based boolean indexing on an integer type is not available) #3194

Closed chatchawit closed 4 years ago

chatchawit commented 4 years ago

Version info

To Reproduce

Observed behavior 2020-04-18 05:58:55 bcbio-virtual-machine peddy.cli[17466] INFO ran in 0.3 seconds Traceback (most recent call last): File "/home/bcbio/install/stable/anaconda/envs/python2/bin/peddy", line 10, in sys.exit(cli()) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/click/core.py", line 555, in invoke return callback(args, **kwargs) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/peddy/cli.py", line 226, in peddy da = df.iloc[df['sample_duplication_error'], 'sample_a'] File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1494, in getitem return self._getitem_tuple(key) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/pandas/core/indexing.py", line 2143, in _getitem_tuple self._has_valid_tuple(tup) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/pandas/core/indexing.py", line 223, in _has_valid_tuple self._validate_key(k, i) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/pandas/core/indexing.py", line 2060, in _validate_key raise NotImplementedError("iLocation based boolean " NotImplementedError: iLocation based boolean indexing on an integer type is not available

Expected behavior Peddy should work correctly, but terminated with an error.

Log files I've attached bcbio-nextgen.log, bcbio-nextgen-commands.log, and bcbio-nextgen-debug.log. log.zip

Additional context I've rerun Peddy manually. It still produced the same error below.

bcbio@bcbio-virtual-machine:~/nute/work1/log$ peddy -p 8 --sites hg38 --plot /home/bcbio/nute/work1/qc/pair001/peddy/pair001-effects-annotated-filter-germline.vcf.gz /home/bcbio/nute/work1/qc/pair001/peddy/pair001-effects-annotated-filter-germline.ped 2020-04-18 10:06:05 bcbio-virtual-machine peddy.cli[19171] INFO Running Peddy version 0.4.4 /home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/peddy/cli.py:178: FutureWarning: read_table is deprecated, use read_csv instead. sep="\t") 2020-04-18 10:06:05 bcbio-virtual-machine peddy.cli[19171] INFO ped_check 2020-04-18 10:06:06 bcbio-virtual-machine numexpr.utils[19171] INFO Note: NumExpr detected 48 cores but "NUMEXPR_MAX_THREADS" not set, so enforcing safe limit of 8. 2020-04-18 10:06:06 bcbio-virtual-machine numexpr.utils[19171] INFO NumExpr defaulting to 8 threads. 2020-04-18 10:06:07 bcbio-virtual-machine peddy.peddy[19171] INFO plotting 2020-04-18 10:06:08 bcbio-virtual-machine peddy.cli[19171] INFO ran in 2.6 seconds 2020-04-18 10:06:08 bcbio-virtual-machine peddy.cli[19171] INFO het_check 2020-04-18 10:06:10 bcbio-virtual-machine peddy.pca[19171] INFO loaded and subsetted thousand-genomes genotypes (shape: (2504, 4776)) in 1.2 seconds /home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/sklearn/svm/base.py:196: FutureWarning: The default value of gamma will change from 'auto' to 'scale' in version 0.22 to account better for unscaled features. Set gamma explicitly to 'auto' or 'scale' to avoid this warning. "avoid this warning.", FutureWarning) 2020-04-18 10:06:11 bcbio-virtual-machine peddy.pca[19171] INFO ran randomized PCA on thousand-genomes samples at 4776 sites in 0.8 seconds 2020-04-18 10:06:11 bcbio-virtual-machine peddy.pca[19171] INFO Projected thousand-genomes genotypes and sample genotypes and predicted ancestry via SVM in 0.1 seconds /home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/peddy/peddy.py:916: FutureWarning: Sorting because non-concatenation axis is not aligned. A future version of pandas will change to not sort by default.

To accept the future behavior, pass 'sort=False'.

To retain the current behavior and silence the warning, pass 'sort=True'.

df = pd.concat((df, pca_df), axis=1) 2020-04-18 10:06:12 bcbio-virtual-machine peddy.cli[19171] INFO ran in 4.2 seconds 2020-04-18 10:06:12 bcbio-virtual-machine peddy.cli[19171] INFO sex_check no intervals found for /home/bcbio/nute/work1/qc/pair001/peddy/pair001-effects-annotated-filter-germline.vcf.gz at X:2781480 2020-04-18 10:06:12 bcbio-virtual-machine peddy.peddy[19171] INFO sex-check: 39 skipped / 1411 kept 2020-04-18 10:06:12 bcbio-virtual-machine peddy.cli[19171] INFO ran in 0.3 seconds Traceback (most recent call last): File "/home/bcbio/install/stable/anaconda/envs/python2/bin/peddy", line 10, in sys.exit(cli()) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/click/core.py", line 764, in call return self.main(args, kwargs) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/click/core.py", line 717, in main rv = self.invoke(ctx) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/click/core.py", line 956, in invoke return ctx.invoke(self.callback, ctx.params) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/click/core.py", line 555, in invoke return callback(args, **kwargs) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/peddy/cli.py", line 226, in peddy da = df.iloc[df['sample_duplication_error'], 'sample_a'] File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/pandas/core/indexing.py", line 1494, in getitem return self._getitem_tuple(key) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/pandas/core/indexing.py", line 2143, in _getitem_tuple self._has_valid_tuple(tup) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/pandas/core/indexing.py", line 223, in _has_valid_tuple self._validate_key(k, i) File "/home/bcbio/install/stable/anaconda/envs/python2/lib/python2.7/site-packages/pandas/core/indexing.py", line 2060, in _validate_key raise NotImplementedError("iLocation based boolean " NotImplementedError: iLocation based boolean indexing on an integer type is not available

chatchawit commented 4 years ago

I could fixed this issue by upgrading peddy 0.4.4 to 0.4.6. Subsequently, I found another issue about cnvkit. Please take a look at https://github.com/etal/cnvkit/issues/511

naumenko-sa commented 4 years ago

Thanks @chatchawit for figuring this out!

peddy is 0.4.6 already in bioconda and switched to python3 https://github.com/bioconda/bioconda-recipes/blob/master/recipes/peddy/meta.yaml

We are not pinning it, on our system it is 0.4.6, but it is was still in python2 env. I fixed that: https://github.com/chapmanb/cloudbiolinux/blob/master/contrib/flavor/ngs_pipeline_minimal/packages-conda.yaml

re: cnvkit, try updating it to the latest beta: https://github.com/bcbio/bcbio-nextgen/issues/3061

Sergey