XWangLabTHU / cfDNApipe

cfDNApipe: A comprehensive quality control and analysis pipeline for cell-free DNA high-throughput sequencing data
https://xwanglabthu.github.io/cfDNApipe/
Other
61 stars 31 forks source link

IndexError in GCCorrect #4

Closed sroener closed 2 years ago

sroener commented 3 years ago

Hi,

I'm trying to use your implementation of GC correction. Unfortunately, the pipeline throws an error. The example from documentation was used as reference.

Code

from cfDNApipe import *
import glob

verbose = True

pipeConfigure(
    threads=2,
    genome="hg19",
    refdir="data/hg19/",
    outdir=r"./output/pcs_armCNV",
    data="WGS",
    type="paired",
    JavaMem="8G",
    build=True,
)

case_bam = glob.glob("../../BAMs/Case.bam")

switchConfigure("cancer")
case_bamCounter = bamCounter(
    bamInput=case_bam, upstream=True, verbose=verbose, stepNum="case01"
)
case_gcCounter = runCounter(
    filetype=0, upstream=True, verbose=verbose, stepNum="case02"
)
case_GCCorrect = GCCorrect(
    readupstream=case_bamCounter,
    gcupstream=case_gcCounter,
    verbose=verbose,
    stepNum="case03",
)

Error

Reference file ./data/hg19/./data/hg19/hg19.fa.fai do not exist!
Background reference check finished!
************************************************************
               bamCounter has been completed!               
************************************************************
************************************************************
               runCounter has been completed!               
************************************************************
Now, processing Case_read ...
---------------------------------------------------------------------------
IndexError                                Traceback (most recent call last)
<ipython-input-37-3643ac7bea59> in <module>
     16     gcupstream=case_gcCounter,
     17     verbose=verbose,
---> 18     stepNum="case03",
     19 )
     20 ## ctrl

~/work/miniconda/envs/cfDNApipe/lib/python3.6/site-packages/cfDNApipe/Fun_GCcorrect.py in __init__(self, readInput, gcwigInput, readtype, corrkey, outputdir, threads, stepNum, readupstream, gcupstream, verbose, **kwargs)
    159                         self.getOutput("plotOutput")[i],
    160                         self.getParam("corrkey"),
--> 161                         self.getParam("readtype"),
    162                     )
    163             else:

~/work/miniconda/envs/cfDNApipe/lib/python3.6/site-packages/cfDNApipe/cfDNA_utils.py in correctReadCount(readfileInput, gcfileInput, txtOutput, plotOutput, corrkey, readtype, sampleMaxSize)
    886 
    887     # run the GC_correct function
--> 888     correct_reads, correct_reads2, valid = GC_correct(reads, gc, plotOutput, corrkey, sampleMaxSize)
    889 
    890     readOutput = pd.DataFrame(

~/work/miniconda/envs/cfDNApipe/lib/python3.6/site-packages/cfDNApipe/cfDNA_utils.py in GC_correct(readInput, gcInput, plotOutput, corrkey, sampleMaxSize)
    912     ideal = [True for i in range(readl)]
    913     routlier, doutlier = 0.01, 0.001
--> 914     lrange = np.percentile(np.array([readInput[i] for i in range(readl) if valid[i]]), 0)
    915     rrange = np.percentile(np.array([readInput[i] for i in range(readl) if valid[i]]), (1 - routlier) * 100)
    916     ldomain = np.percentile(np.array([gcInput[i] for i in range(readl) if valid[i]]), doutlier * 100)

<__array_function__ internals> in percentile(*args, **kwargs)

~/work/miniconda/envs/cfDNApipe/lib/python3.6/site-packages/numpy/lib/function_base.py in percentile(a, q, axis, out, overwrite_input, interpolation, keepdims)
   3731         raise ValueError("Percentiles must be in the range [0, 100]")
   3732     return _quantile_unchecked(
-> 3733         a, q, axis, out, overwrite_input, interpolation, keepdims)
   3734 
   3735 

~/work/miniconda/envs/cfDNApipe/lib/python3.6/site-packages/numpy/lib/function_base.py in _quantile_unchecked(a, q, axis, out, overwrite_input, interpolation, keepdims)
   3851     r, k = _ureduce(a, func=_quantile_ureduce_func, q=q, axis=axis, out=out,
   3852                     overwrite_input=overwrite_input,
-> 3853                     interpolation=interpolation)
   3854     if keepdims:
   3855         return r.reshape(q.shape + k)

~/work/miniconda/envs/cfDNApipe/lib/python3.6/site-packages/numpy/lib/function_base.py in _ureduce(a, func, **kwargs)
   3427         keepdim = (1,) * a.ndim
   3428 
-> 3429     r = func(a, **kwargs)
   3430     return r, keepdim
   3431 

~/work/miniconda/envs/cfDNApipe/lib/python3.6/site-packages/numpy/lib/function_base.py in _quantile_ureduce_func(a, q, axis, out, overwrite_input, interpolation, keepdims)
   3965             n = np.isnan(ap[-1:, ...])
   3966 
-> 3967         x1 = take(ap, indices_below, axis=axis) * weights_below
   3968         x2 = take(ap, indices_above, axis=axis) * weights_above
   3969 

<__array_function__ internals> in take(*args, **kwargs)

~/work/miniconda/envs/cfDNApipe/lib/python3.6/site-packages/numpy/core/fromnumeric.py in take(a, indices, axis, out, mode)
    189            [5, 7]])
    190     """
--> 191     return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
    192 
    193 

~/work/miniconda/envs/cfDNApipe/lib/python3.6/site-packages/numpy/core/fromnumeric.py in _wrapfunc(obj, method, *args, **kwds)
     56 
     57     try:
---> 58         return bound(*args, **kwds)
     59     except TypeError:
     60         # A TypeError occurs if the object does have such a method in its

IndexError: cannot do a non-empty take from an empty axes.

additional information

Honchkrow commented 3 years ago

Hi, @sroener The error (IndexError: cannot do a non-empty take from an empty axes) may caused by the empty line in some file. Can you provide me with the output files of bamCounter and runCounter, therefore I can reproduce your problem.

Thanks for using cfDNApipe!

Wei

Honchkrow commented 3 years ago

My e-mail: w-zhang16@mails.tsinghua.edu.cn

gatsby2016 commented 2 years ago

Hi, @sroener @Honchkrow
This problem may cause by the gcCounterand readCounter bin file in hmmcopy_utils package.

Recently, I conducted the experiment and the first time i ran cfDNApipe it caused an error at gcCounter in runCounter() function.

Now, running command: gcCounter -w 5000000 /home/cyyan/Projects/cfDNA/cfDNApipe/experiments/refs/hg19.fa > /home/cyyan/Projects/cfDNA/cfDNApipe/experiments/outs/intermediate_result/step_FP02_runCounter/hg19.gc.wig
/bin/sh: 1: gcCounter: not found
Traceback (most recent call last):
  File "/home/cyyan/Projects/cfDNA/cfDNApipe/experiments/exp1_WGS_PE.py", line 32, in <module>
    report=True,
  File "/home/cyyan/Projects/cfDNA/cfDNApipe/cfDNApipe/Pipeline.py", line 221, in cfDNAWGS
    fp_gcCounter = runCounter(filetype=0, binlen=5000000, upstream=True, verbose=verbose, stepNum="FP02")
  File "/home/cyyan/Projects/cfDNA/cfDNApipe/cfDNApipe/Fun_counter.py", line 158, in __init__
    self.run(all_cmd)
  File "/home/cyyan/Projects/cfDNA/cfDNApipe/cfDNApipe/StepBase.py", line 474, in run
    raise commonError("**********CMD running error**********")
cfDNApipe.cfDNA_utils.commonError: **********CMD running error**********

But the error disappeared when i restarted the program. It showed as :

************************************************************
               fpCounter has been completed!                
************************************************************
Now, processing case1_short ...
Traceback (most recent call last):
  File "/home/cyyan/Projects/cfDNA/cfDNApipe/experiments/exp1_WGS_PE.py", line 32, in <module>
    report=True,
  File "/home/cyyan/Projects/cfDNA/cfDNApipe/cfDNApipe/Pipeline.py", line 228, in cfDNAWGS
    stepNum="FP03",
  File "/home/cyyan/Projects/cfDNA/cfDNApipe/cfDNApipe/Fun_GCcorrect.py", line 161, in __init__
    self.getParam("readtype"),
  File "/home/cyyan/Projects/cfDNA/cfDNApipe/cfDNApipe/cfDNA_utils.py", line 888, in correctReadCount
    correct_reads, correct_reads2, valid = GC_correct(reads, gc, plotOutput, corrkey, sampleMaxSize)
  File "/home/cyyan/Projects/cfDNA/cfDNApipe/cfDNApipe/cfDNA_utils.py", line 914, in GC_correct
    lrange = np.percentile(np.array([readInput[i] for i in range(readl) if valid[i]]), 0)
  File "<__array_function__ internals>", line 6, in percentile
  File "/home/cyyan/anaconda3/envs/cfdnapipe/lib/python3.6/site-packages/numpy/lib/function_base.py", line 3733, in percentile
    a, q, axis, out, overwrite_input, interpolation, keepdims)
  File "/home/cyyan/anaconda3/envs/cfdnapipe/lib/python3.6/site-packages/numpy/lib/function_base.py", line 3853, in _quantile_unchecked
    interpolation=interpolation)
  File "/home/cyyan/anaconda3/envs/cfdnapipe/lib/python3.6/site-packages/numpy/lib/function_base.py", line 3429, in _ureduce
    r = func(a, **kwargs)
  File "/home/cyyan/anaconda3/envs/cfdnapipe/lib/python3.6/site-packages/numpy/lib/function_base.py", line 3967, in _quantile_ureduce_func
    x1 = take(ap, indices_below, axis=axis) * weights_below
  File "<__array_function__ internals>", line 6, in take
  File "/home/cyyan/anaconda3/envs/cfdnapipe/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 191, in take
    return _wrapfunc(a, 'take', indices, axis=axis, out=out, mode=mode)
  File "/home/cyyan/anaconda3/envs/cfdnapipe/lib/python3.6/site-packages/numpy/core/fromnumeric.py", line 58, in _wrapfunc
    return bound(*args, **kwds)
IndexError: cannot do a non-empty take from an empty axes.

And then another error appeared, showed as yours:

IndexError: cannot do a non-empty take from an empty axes.

But I knew, the error at gcCounter in runCounter() function remained unsolved.

So I checked the generated intermediate files in outs/intermediate_result/step_FP02_runCounter, actually, hg19.gc.wig is empty (only had the value 1). I remembered that there is a breakpoint detection mechanism and it may caused false passing.

And i deleted that folder step_FP02_runCounter, rm -r step_FP0*. The error at gcCounter appeared again.

So, the problem is on the runCounter() function where there was no gcCounterand readCounter executable file for cmd running.

I solved it by building the hmmcopy_utils from https://github.com/shahcompbio/hmmcopy_utils and adding the built bin/ folder location to ~/.bashrc on OS ubuntu20.04.

Good luck!

Honchkrow commented 2 years ago

Hi, @gatsby2016

thanks for your help.

I reproduced the problem. It semms that hmmcopy_utils installed by conda is broken sometimes. I reinstalled hmmcopy_utils and the problem disappear. I have tried another version of hmmcopy_utils and will release a new environment file recently.

thank you again!