etal / cnvkit

Copy number variant detection from targeted DNA sequencing
http://cnvkit.readthedocs.org
Other
545 stars 165 forks source link

Could not retrieve index file for CRAM files #779

Open 227BaronChen opened 1 year ago

227BaronChen commented 1 year ago

Hi all,

I am calling cnv with WES data in CRAM format, using GRCh38/hg38 as FASTA , and the following error occurs.

CNVkit 0.9.9 Detected file format: bed Splitting large targets Applying annotations as target names Detected file format: refflat Wrote ./ICC_WES_cnvkit_results-3pairs_test/my_baits.target.bed with 295412 regions Detected file format: bed Wrote ./ICC_WES_cnvkit_results-3pairs_test/my_baits.antitarget.bed with 39801 regions Building a copy number reference from normal samples... [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chrM, 16571 vs 16569 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr1, 249250621 vs 248956422 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr2, 243199373 vs 242193529 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr3, 198022430 vs 198295559 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr4, 191154276 vs 190214555 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr5, 180915260 vs 181538259 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr6, 171115067 vs 170805979 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr7, 159138663 vs 159345973 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr8, 146364022 vs 145138636 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr9, 141213431 vs 138394717 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr10, 135534747 vs 133797422 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr11, 135006516 vs 135086622 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr12, 133851895 vs 133275309 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr13, 115169878 vs 114364328 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr14, 107349540 vs 107043718 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr15, 102531392 vs 101991189 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr16, 90354753 vs 90338345 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr17, 81195210 vs 83257441 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr18, 78077248 vs 80373285 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr19, 59128983 vs 58617616 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr20, 63025520 vs 64444167 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr21, 48129895 vs 46709983 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chr22, 51304566 vs 50818468 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chrX, 155270560 vs 156040895 [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chrY, 59373566 vs 57227415 [E::cram_index_load] Could not retrieve index file for './wes/ICC-068N.WES.bqsr.cram' [E::cram_decode_slice] MD5 checksum reference mismatch at #0:1-103 [E::cram_decode_slice] CRAM: 88754db9a43fddbdf42cc016bc9d5c0c [E::cram_decode_slice] Ref : d3aa9465c5407f039d4d1228506f9abf [E::cram_next_slice] Failure to decode slice

I realized that the FASTA file might not match, so I replaced the FASTA file with the one that matched the CRAM data, that is, GRCh37/hg19.

But there is still an error.

CNVkit 0.9.9 Detected file format: bed Splitting large targets Applying annotations as target names Detected file format: refflat Wrote ./ICC_WES_cnvkit-results-3pairs_test/my_baits.hg19.target.bed with 297434 regions Wrote ./ICC_WES_cnvkit-results-3pairs_test/my_baits.hg19.antitarget.bed with 40775 regions Building a copy number reference from normal samples... [W::sanitise_SQ_lines] Header @SQ length mismatch for ref chrM, 16571 vs 16569 [E::cram_index_load] Could not retrieve index file for './wes/ICC-068N.WES.bqsr.cram' [E::cram_decode_slice] MD5 checksum reference mismatch at #0:1-103 [E::cram_decode_slice] CRAM: 88754db9a43fddbdf42cc016bc9d5c0c [E::cram_decode_slice] Ref : d3aa9465c5407f039d4d1228506f9abf [E::cram_next_slice] Failure to decode slice

I'm a little confused. I have checked that the length of ref chrM (hg19) is 16571. And I have no input other than CRAM file (Just like when I use the BAM file to call cnv). I have the corresponding md5 files, but do I need to input it ? or how to input it ? to make CRAM usable. Can you advise?

So sorry if I am missing something.

Best regards, Baron