dieterich-lab / DCC

DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.
https://dieterichlab.org/software/
GNU General Public License v3.0
36 stars 20 forks source link

ValueError: need more than 2 values to unpack #76

Closed corner0426 closed 4 years ago

corner0426 commented 4 years ago

Describe the bug Hello, I am using DCC to identify circRNAs from three samples, but I got a bug: DCC-2020-02-25_1123.log

DCC 0.4.8 started 32 CPU cores available, using 20 Please make sure that the read pairs have been mapped both, combined and on a per mate basis Collecting chimera information from mates-separate mapping started circRNA detection from file _tmp_DCC/019A_circRNA_Align1_Chimeric.out.junction.M4POU7 started circRNA detection from file _tmp_DCC/026A_circRNA_Align1_Chimeric.out.junction.6VE5LK started circRNA detection from file _tmp_DCC/039A_circRNA_Align1_Chimeric.out.junction.MRKJ7I => separating duplicates [_tmp_DCC/026A_circRNA_Align1_Chimeric.out.junction.6VE5LK] => separating duplicates [_tmp_DCC/039A_circRNA_Align1_Chimeric.out.junction.MRKJ7I] => separating duplicates [_tmp_DCC/019A_circRNA_Align1_Chimeric.out.junction.M4POU7] Read 39333127.+.39328999.A00783:136:HNNVYDSXX:1:1272:12509:26725 has more than 2 count. Read 39333127.+.39328999.A00783:136:HNNVYDSXX:1:1272:12509:26725 has more than 2 count. Read 105120554.-.105121205.A00783:136:HNNVYDSXX:1:1454:18385:34053 has more than 2 count. Read 105120554.-.105121205.A00783:136:HNNVYDSXX:1:1454:18385:34053 has more than 2 count. => locating small circRNAs [_tmp_DCC/026A_circRNA_Align1_Chimeric.out.junction.6VE5LK] => locating circRNAs (stranded mode) [_tmp_DCC/026A_circRNA_Align1_Chimeric.out.junction.6VE5LK] => merging circRNAs [_tmp_DCC/026A_circRNA_Align1_Chimeric.out.junction.6VE5LK] => sorting circRNAs (stranded mode) [_tmp_DCC/026A_circRNA_Align1_Chimeric.out.junction.6VE5LK] finished circRNA detection from file _tmp_DCC/026A_circRNA_Align1_Chimeric.out.junction.6VE5LK Read 50108069.-.50108305.A00783:136:HNNVYDSXX:1:2647:1759:25426 has more than 2 count. Read 50108069.-.50108305.A00783:136:HNNVYDSXX:1:2647:1759:25426 has more than 2 count. Read 39333127.+.39328999.A00783:136:HNNVYDSXX:1:1272:12509:26725 has more than 2 count. => locating small circRNAs [_tmp_DCC/039A_circRNA_Align1_Chimeric.out.junction.MRKJ7I] => locating circRNAs (stranded mode) [_tmp_DCC/039A_circRNA_Align1_Chimeric.out.junction.MRKJ7I] => merging circRNAs [_tmp_DCC/039A_circRNA_Align1_Chimeric.out.junction.MRKJ7I] => sorting circRNAs (stranded mode) [_tmp_DCC/039A_circRNA_Align1_Chimeric.out.junction.MRKJ7I] finished circRNA detection from file _tmp_DCC/039A_circRNA_Align1_Chimeric.out.junction.MRKJ7I Read 105120554.-.105121205.A00783:136:HNNVYDSXX:1:1454:18385:34053 has more than 2 count. Read 50108069.-.50108305.A00783:136:HNNVYDSXX:1:2647:1759:25426 has more than 2 count. => locating small circRNAs [_tmp_DCC/019A_circRNA_Align1_Chimeric.out.junction.M4POU7] => locating circRNAs (stranded mode) [_tmp_DCC/019A_circRNA_Align1_Chimeric.out.junction.M4POU7] => merging circRNAs [_tmp_DCC/019A_circRNA_Align1_Chimeric.out.junction.M4POU7] => sorting circRNAs (stranded mode) [_tmp_DCC/019A_circRNA_Align1_Chimeric.out.junction.M4POU7] finished circRNA detection from file _tmp_DCC/019A_circRNA_Align1_Chimeric.out.junction.M4POU7 Combining individual circRNA read counts Using files _tmp_DCC/tmp_circCount and _tmp_DCC/tmp_coordinates for filtering Filtering by read counts Traceback (most recent call last): File "/home/yaoyh/.local/bin/DCC", line 11, in load_entry_point('DCC==0.4.8', 'console_scripts', 'DCC')() File "build/bdist.linux-x86_64/egg/DCC/main.py", line 375, in main File "build/bdist.linux-x86_64/egg/DCC/circFilter.py", line 92, in filter_nonrep File "build/bdist.linux-x86_64/egg/DCC/circFilter.py", line 85, in read_rep_region File "/home/yaoyh/.local/lib/python2.7/site-packages/HTSeq-0.11.2-py2.7-linux-x86_64.egg/HTSeq/init.py", line 207, in iter strand, frame, attributeStr) = line.split("\t", 8) ValueError: need more than 2 values to unpack

To Reproduce I used the following command line: DCC @samplesheet -mt1 @mate1 -mt2 @mate2 -T 20 -D -R /data1/yaoyh/GRCh38_Repeats_simpleRepeats_RepeatMasker.gtf -an /data1/yaoyh/gencode.v27.annotation.gtf -Pi -F -M -Nr 1 1 -fg -G -A /data1/biocloud/genome/human_ensemble_chr/hg38.fa -B @bam_files

Please help me figure out this problem, Thank you very much!

corner0426 commented 4 years ago

I am using python2 when runing DCC

tjakobi commented 4 years ago

Hi @corner0426,

thank you for reporting the issue.

From the line where the the execution is stopping is seems there might be a problem with your annotation file. COuld you paste the first 10 lines of your repeat and annotation GTF file?

Cheers, Tobias

corner0426 commented 4 years ago

Thanks for your reply!

Here was my GTF files.

annotation_10line_gtf.txt repeat_10line_gtf.txt

tjakobi commented 4 years ago

Thank you for the files. Would is be possible to do the same with the end of the files. It seems one of the files ends prematurely.

Thank you!

corner0426 commented 4 years ago

Thank you very much, I have noticed that my repeat gtf file ended prematurely.

Best regards!