etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
541 stars 165 forks source link

Error creating a flat reference file: Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD #508

Open moldach opened 4 years ago

moldach commented 4 years ago

I've got WGS samples without a reference from normal samples and would like to create a "flat"reference.

Following the documentation I use the following code to generate the reference:

cnvkit.py batch N2_trim_bwaMEM_sort_dedupped.bam -n -m wgs -f ~/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa --output-reference my_flat_reference.cnn -d example2

The script is generating files but error's out with the following:

Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.

Here's the full output:

(.venv) [moldach@cdr767 cnvkit]$ cnvkit.py batch N2_trim_bwaMEM_sort_dedupped.bam -n -m wgs -f ~/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa --output-reference my_flat_reference.cnn -d example2/
CNVkit 0.9.7.b1
WGS protocol: recommend '--annotate' option (e.g. refFlat.txt) to help locate genes in output files.
I: Scanning for accessible regions
        Accessible region I:0-15072434 (size 15072434)
II: Scanning for accessible regions
        Accessible region II:0-15279421 (size 15279421)
III: Scanning for accessible regions
        Accessible region III:0-13783801 (size 13783801)
IV: Scanning for accessible regions
        Accessible region IV:0-17493829 (size 17493829)
V: Scanning for accessible regions
        Accessible region V:0-20924180 (size 20924180)
X: Scanning for accessible regions
        Accessible region X:0-17718942 (size 17718942)
MtDNA: Scanning for accessible regions
        Accessible region MtDNA:0-13794 (size 13794)
I: Joining over small gaps
II: Joining over small gaps
III: Joining over small gaps
IV: Joining over small gaps
V: Joining over small gaps
X: Joining over small gaps
MtDNA: Joining over small gaps
Wrote c_elegans.PRJNA13758.WS265.genomic.bed with 7 regions
Detected file format: bed
Splitting large targets
Wrote example2/c_elegans.PRJNA13758.WS265.genomic.target.bed with 20058 regions
Wrote example2/c_elegans.PRJNA13758.WS265.genomic.antitarget.bed with 0 regions
Building a flat reference...
Detected file format: bed
Calculating GC and RepeatMasker content in /home/moldach/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa ...
Extracting sequences from chromosome X
Extracting sequences from chromosome I
Extracting sequences from chromosome V
Extracting sequences from chromosome II
Extracting sequences from chromosome III
Extracting sequences from chromosome IV
Extracting sequences from chromosome MtDNA
Moved existing file my_flat_reference.cnn -> my_flat_reference.cnn.1
Wrote my_flat_reference.cnn with 20058 regions
Running 1 samples in serial
Running the CNVkit pipeline on N2_trim_bwaMEM_sort_dedupped.bam ...
Processing reads in N2_trim_bwaMEM_sort_dedupped.bam
Time: 95.864 seconds (166711 reads/sec, 209 bins/sec)
Summary: #bins=20058, #reads=15981558, mean=796.7673, min=221.6326530612245, max=74371.33673469388
Percent reads in regions: 132.547 (of 12057314 mapped)
Wrote example2/N2_trim_bwaMEM_sort_dedupped.targetcoverage.cnn with 20058 regions
Skip processing N2_trim_bwaMEM_sort_dedupped.bam with empty regions file example2/c_elegans.PRJNA13758.WS265.genomic.antitarget.bed
Wrote example2/N2_trim_bwaMEM_sort_dedupped.antitargetcoverage.cnn with 0 regions
Processing target: N2_trim_bwaMEM_sort_dedupped
Keeping 19320 of 20058 bins
Correcting for GC bias...
Processing antitarget: N2_trim_bwaMEM_sort_dedupped
Wrote example2/N2_trim_bwaMEM_sort_dedupped.cnr with 19320 regions
Segmenting example2/N2_trim_bwaMEM_sort_dedupped.cnr ...
Segmenting with method 'cbs', significance threshold 1e-06, in 1 processes

Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
Traceback (most recent call last):
  File "/scratch/moldach/bin/cnvkit/.venv/bin/cnvkit.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/scratch/moldach/bin/cnvkit/cnvkit.py", line 9, in <module>
    args.func(args)
  File "/scratch/moldach/bin/cnvkit/cnvlib/commands.py", line 138, in _cmd_batch
    pool.submit(batch.batch_run_sample,
  File "/scratch/moldach/bin/cnvkit/cnvlib/parallel.py", line 19, in submit
    return SerialFuture(func(*args))
  File "/scratch/moldach/bin/cnvkit/cnvlib/batch.py", line 186, in batch_run_sample
    segments = segmentation.do_segmentation(cnarr, segment_method,
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 61, in do_segmentation
    rets = list(pool.map(_ds, ((ca, method, threshold, variants,
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 89, in _ds
    return _do_segmentation(*args)
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 106, in _do_segmentation
    filtered_cn = drop_outliers(filtered_cn, 50, skip_outliers)
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 205, in drop_outliers
    outlier_mask = np.concatenate([
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 206, in <listcomp>
    smoothing.rolling_outlier_quantile(subarr['log2'], width, .95, factor)
  File "/scratch/moldach/bin/cnvkit/cnvlib/smoothing.py", line 328, in rolling_outlier_quantile
    dists = np.abs(x - savgol(x, width))
  File "/scratch/moldach/bin/cnvkit/cnvlib/smoothing.py", line 189, in savgol
    y = savgol_filter(y, window_width, order, mode='interp')
  File "/scratch/moldach/bin/cnvkit/.venv/lib/python3.8/site-packages/scipy/signal/_savitzky_golay.py", line 348, in savgol_filter
    _fit_edges_polyfit(x, window_length, polyorder, deriv, delta, axis, y)
  File "/scratch/moldach/bin/cnvkit/.venv/lib/python3.8/site-packages/scipy/signal/_savitzky_golay.py", line 220, in _fit_edges_polyfit
    _fit_edge(x, 0, window_length, 0, halflen, axis,
  File "/scratch/moldach/bin/cnvkit/.venv/lib/python3.8/site-packages/scipy/signal/_savitzky_golay.py", line 190, in _fit_edge
    poly_coeffs = np.polyfit(np.arange(0, window_stop - window_start),
  File "<__array_function__ internals>", line 5, in polyfit
  File "/scratch/moldach/bin/cnvkit/.venv/lib/python3.8/site-packages/numpy/lib/polynomial.py", line 631, in polyfit
    c, resids, rank, s = lstsq(lhs, rhs, rcond)
  File "<__array_function__ internals>", line 5, in lstsq
  File "/scratch/moldach/bin/cnvkit/.venv/lib/python3.8/site-packages/numpy/linalg/linalg.py", line 2259, in lstsq
    x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
  File "/scratch/moldach/bin/cnvkit/.venv/lib/python3.8/site-packages/numpy/linalg/linalg.py", line 109, in _raise_linalgerror_lstsq
    raise LinAlgError("SVD did not converge in Linear Least Squares")
numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares
etal commented 4 years ago

That's odd, it looks like the scipy package installed in your environment may have been compiled with the wrong flags, or something along those lines related to packaging and API/ABI compability.

HX07 commented 4 years ago

Hi, I met with a same problem as you. I'm using CNVkit-v0.9.6 and installed CNVkit by using pip install.

I fixed this problem by uninstalling my current numpy-v1.18.4 and installing numpy-v1.17.2. You could try this. Here is the issue I referred to: https://github.com/numpy/numpy/issues/12941.

My Python version is 3.7.6, installed by Anaconda3-2020-02 and my OS is CentOS 6.4.

So, it seems that there are some conflict between latest numpy and CNVkit script. Would you please check this in current CNVkit? @etal

moldach commented 4 years ago

Hi @HX07 thanks for confirming this.

@etal I'm using an academic HPC, ComputeCanada, and the admins forbid us from using conda. Therefore I followed the instructions to install the most recent version from source:

git clone https://github.com/etal/cnvkit
cd cnvkit/

# Load modules on ComputeCanada and setup virtualenv
module load python/3.8
virtualenv venv
source venv/bin/activate/

pip install -e .

However, uninstalling numpy-v1.18.4 and installing numpy-v1.17.2 did not fix the issue for me.

Not sure how you installed the older version @HX07. I got an error my first try:

(cnvkit_venv) [moldach@cdr767 cnvkit]$ pip install 'numpy==1.17.2'
Ignoring pip: markers 'python_version < "3"' don't match your environment
Looking in links: /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/avx2, /cvmfs/soft.computecanada.ca/custom/python/wheelhouse/generic
ERROR: Could not find a version that satisfies the requirement numpy==1.17.2 (from versions: 1.17.4, 1.18.1)
ERROR: No matching distribution found for numpy==1.17.2
(cnvkit_venv) [moldach@cdr767 cnvkit]$ easy_install

Not familiar with installing older versions of PyPi so I tried another method from google search:

(cnvkit_venv) [moldach@cdr767 cnvkit]$ easy_install numpy==1.17.2

...

Adding numpy 1.17.2 to easy-install.pth file
Installing f2py script to /scratch/moldach/bin/cnvkit/cnvkit_venv/bin
Installing f2py3 script to /scratch/moldach/bin/cnvkit/cnvkit_venv/bin
Installing f2py3.8 script to /scratch/moldach/bin/cnvkit/cnvkit_venv/bin

Installed /scratch/moldach/bin/cnvkit/cnvkit_venv/lib/python3.8/site-packages/numpy-1.17.2-py3.8-linux-x86_64.egg
Processing dependencies for numpy==1.17.2
Finished processing dependencies for numpy==1.17.2

Okay now that I have this older version I will try again:

(cnvkit_venv) [moldach@cdr767 cnvkit]$ cnvkit.py batch N2_trim_bwaMEM_sort_dedupped.bam -n -m wgs -f ~/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa
/scratch/moldach/bin/cnvkit/skgenome/intersect.py:11: FutureWarning: pandas.core.index is deprecated and will be removed in a future version.  The public classes are available in the top-level namespace.
  from pandas.core.index import Int64Index
CNVkit 0.9.7.b1
WGS protocol: recommend '--annotate' option (e.g. refFlat.txt) to help locate genes in output files.
I: Scanning for accessible regions
    Accessible region I:0-15072434 (size 15072434)
II: Scanning for accessible regions
    Accessible region II:0-15279421 (size 15279421)
III: Scanning for accessible regions
    Accessible region III:0-13783801 (size 13783801)
IV: Scanning for accessible regions
    Accessible region IV:0-17493829 (size 17493829)
V: Scanning for accessible regions
    Accessible region V:0-20924180 (size 20924180)
X: Scanning for accessible regions
    Accessible region X:0-17718942 (size 17718942)
MtDNA: Scanning for accessible regions
    Accessible region MtDNA:0-13794 (size 13794)
I: Joining over small gaps
II: Joining over small gaps
III: Joining over small gaps
IV: Joining over small gaps
V: Joining over small gaps
X: Joining over small gaps
MtDNA: Joining over small gaps
Wrote c_elegans.PRJNA13758.WS265.genomic.bed with 7 regions
Detected file format: bed
Splitting large targets
Wrote ./c_elegans.PRJNA13758.WS265.genomic.target.bed with 20058 regions
Wrote ./c_elegans.PRJNA13758.WS265.genomic.antitarget.bed with 0 regions
Building a flat reference...
Detected file format: bed
Calculating GC and RepeatMasker content in /home/moldach/projects/def-mtarailo/common/indexes/WS265_wormbase/c_elegans.PRJNA13758.WS265.genomic.fa ...
Extracting sequences from chromosome X
Extracting sequences from chromosome I
Extracting sequences from chromosome V
Extracting sequences from chromosome II
Extracting sequences from chromosome III
Extracting sequences from chromosome IV
Extracting sequences from chromosome MtDNA
Moved existing file ./reference.cnn -> ./reference.cnn.1
Wrote ./reference.cnn with 20058 regions
Running 1 samples in serial
Running the CNVkit pipeline on N2_trim_bwaMEM_sort_dedupped.bam ...
Processing reads in N2_trim_bwaMEM_sort_dedupped.bam
Time: 101.191 seconds (157934 reads/sec, 198 bins/sec)
Summary: #bins=20058, #reads=15981558, mean=796.7673, min=221.6326530612245, max=74371.33673469388 
Percent reads in regions: 132.547 (of 12057314 mapped)
Wrote ./N2_trim_bwaMEM_sort_dedupped.targetcoverage.cnn with 20058 regions
Skip processing N2_trim_bwaMEM_sort_dedupped.bam with empty regions file ./c_elegans.PRJNA13758.WS265.genomic.antitarget.bed
Wrote ./N2_trim_bwaMEM_sort_dedupped.antitargetcoverage.cnn with 0 regions
Processing target: N2_trim_bwaMEM_sort_dedupped
Keeping 19320 of 20058 bins
Correcting for GC bias...
Processing antitarget: N2_trim_bwaMEM_sort_dedupped
Wrote ./N2_trim_bwaMEM_sort_dedupped.cnr with 19320 regions
Segmenting ./N2_trim_bwaMEM_sort_dedupped.cnr ...
Segmenting with method 'cbs', significance threshold 1e-06, in 1 processes

Intel MKL ERROR: Parameter 6 was incorrect on entry to DGELSD.
Traceback (most recent call last):
  File "/scratch/moldach/bin/cnvkit/cnvkit_venv/bin/cnvkit.py", line 7, in <module>
    exec(compile(f.read(), __file__, 'exec'))
  File "/scratch/moldach/bin/cnvkit/cnvkit.py", line 9, in <module>
    args.func(args)
  File "/scratch/moldach/bin/cnvkit/cnvlib/commands.py", line 138, in _cmd_batch
    pool.submit(batch.batch_run_sample,
  File "/scratch/moldach/bin/cnvkit/cnvlib/parallel.py", line 19, in submit
    return SerialFuture(func(*args))
  File "/scratch/moldach/bin/cnvkit/cnvlib/batch.py", line 186, in batch_run_sample
    segments = segmentation.do_segmentation(cnarr, segment_method,
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 61, in do_segmentation
    rets = list(pool.map(_ds, ((ca, method, threshold, variants,
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 89, in _ds
    return _do_segmentation(*args)
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 106, in _do_segmentation
    filtered_cn = drop_outliers(filtered_cn, 50, skip_outliers)
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 205, in drop_outliers
    outlier_mask = np.concatenate([
  File "/scratch/moldach/bin/cnvkit/cnvlib/segmentation/__init__.py", line 206, in <listcomp>
    smoothing.rolling_outlier_quantile(subarr['log2'], width, .95, factor)
  File "/scratch/moldach/bin/cnvkit/cnvlib/smoothing.py", line 328, in rolling_outlier_quantile
    dists = np.abs(x - savgol(x, width))
  File "/scratch/moldach/bin/cnvkit/cnvlib/smoothing.py", line 189, in savgol
    y = savgol_filter(y, window_width, order, mode='interp')
  File "/scratch/moldach/bin/cnvkit/cnvkit_venv/lib/python3.8/site-packages/scipy/signal/_savitzky_golay.py", line 348, in savgol_filter
    _fit_edges_polyfit(x, window_length, polyorder, deriv, delta, axis, y)
  File "/scratch/moldach/bin/cnvkit/cnvkit_venv/lib/python3.8/site-packages/scipy/signal/_savitzky_golay.py", line 220, in _fit_edges_polyfit
    _fit_edge(x, 0, window_length, 0, halflen, axis,
  File "/scratch/moldach/bin/cnvkit/cnvkit_venv/lib/python3.8/site-packages/scipy/signal/_savitzky_golay.py", line 190, in _fit_edge
    poly_coeffs = np.polyfit(np.arange(0, window_stop - window_start),
  File "<__array_function__ internals>", line 5, in polyfit
  File "/scratch/moldach/bin/cnvkit/cnvkit_venv/lib/python3.8/site-packages/numpy-1.17.2-py3.8-linux-x86_64.egg/numpy/lib/polynomial.py", line 631, in polyfit
    c, resids, rank, s = lstsq(lhs, rhs, rcond)
  File "<__array_function__ internals>", line 5, in lstsq
  File "/scratch/moldach/bin/cnvkit/cnvkit_venv/lib/python3.8/site-packages/numpy-1.17.2-py3.8-linux-x86_64.egg/numpy/linalg/linalg.py", line 2268, in lstsq
    x, resids, rank, s = gufunc(a, b, rcond, signature=signature, extobj=extobj)
  File "/scratch/moldach/bin/cnvkit/cnvkit_venv/lib/python3.8/site-packages/numpy-1.17.2-py3.8-linux-x86_64.egg/numpy/linalg/linalg.py", line 109, in _raise_linalgerror_lstsq
    raise LinAlgError("SVD did not converge in Linear Least Squares")
numpy.linalg.LinAlgError: SVD did not converge in Linear Least Squares
Honchkrow commented 4 years ago

Same error with python 3.6.10 and cnvkit 0.9.7, also installed from conda.

Honchkrow commented 4 years ago

Dear all, I found a solution for this. Intel MKL ERROR was caused by numpy and MKL packages. So if using conda to install cnvkit, just disable MKL will avoid this error. when installing numpy, using the following command conda install nomkl numpy scipy MKL Optimizations (speed boost) will disabled, but cnvkit can run without this error.