etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
501 stars 163 forks source link

AssertionError for RNA-sequencing data with CNVkit-RNA #499

Open akui113 opened 4 years ago

akui113 commented 4 years ago

CNVkit is a best python package to infer copy number information from high-throughput sequencing data Actually, We are very excited to find that you have provided new function within the software package CNVkit that enable copy number inference from RNA-sequencing data However, there are some problem we have encountered when using CNVkit-RNA package and How could I troubleshoot this problem.

the script as following, and the *.genes.results files output from function RSEM, the gencode.v29.annotation.gtf file from gencode and GRCh38.p13.genome.fa from UCSC. rsem-prepare-reference -gtf gencode.v29.annotation.gtf --bowtie2 GRCh38.p13.genome.fa ./RSEMhg38

cnvkit.py import-rna *.genes.results \ --gene-resource data/ensembl-gene-info.hg38.tsv \ --correlations data/tcga-skcm.cnv-expr-corr.tsv \ --output out-summary.tsv --output-dir out/ and the error information as following

Dropping 58722 / 58722 rarely expressed genes from input samples Loading gene metadata and TCGA gene expression/CNV profiles Loaded data/ensembl-gene-info.hg381.tsv with shape: (221323, 9) Loaded data/tcga-skcm.cnv-expr-corr.tsv with shape: (19177, 4) Resetting 2846 ambiguous genes' correlation coefficients to default 0.100000 Trimmed gene info table to shape: (63966, 13) Aligning gene info to sample gene counts Weighting genes with below-average read counts Calculating normalized gene read depths Traceback (most recent call last): File "/usr/bin/cnvkit.py", line 13, in args.func(args) File "/usr/lib64/python2.7/site-packages/cnvlib/commands.py", line 1462, in _cmd_import_rna args.normal, args.do_gc, args.do_txlen, args.max_log2) File "/usr/lib64/python2.7/site-packages/cnvlib/import_rna.py", line 39, in do_import_rna gene_info, sample_counts, tx_lengths, normal_ids) File "/usr/lib64/python2.7/site-packages/cnvlib/rna.py", line 274, in align_gene_info_to_samples normal_ids) File "/usr/lib64/python2.7/site-packages/cnvlib/rna.py", line 310, in normalize_read_depths assert sample_depths.values.sum() > 0 AssertionError

etal commented 4 years ago

Thanks for reporting, and sorry for the delay. It looks like the processing pipeline didn't extract the expression ratios correctly. In this case, since you used RSEM output files, try adding the option --format rsem here.

If that works, I'll see about either detecting the input format automatically or at least documenting the option better and providing clearing diagnostics in the log and error messages.