Parsing error - Githubissues

smahaffey commented 5 years ago

Hi, I'm having trouble getting this to run with our data which is stranded paired end 150bp Illumina data with ribosome depleted totalRNA from several rat tissues. I've performed the alignment both manually and with: CIRCexplorer2 align -p 8 -G Rat.ens.96.gtf -i /path/to/bowtie1/index -j /path/to/bowtie2/index -o SampleName -f Sample.fq.gz

Simplified but this is basically the command line. Manually I used the paired end data, but with the above command just used a single end. It aligns with Tophat2 and then Tophat-fusion and then I get the following: Start parsing fusion junctions from TopHat-Fusion... Traceback (most recent call last): File "/usr/bin/CIRCexplorer2", line 10, in sys.exit(main()) File "/usr/lib/python2.7/site-packages/circ2/command_parse.py", line 43, in main command=command_log, name='align') File "/usr/lib/python2.7/site-packages/circ2/helper.py", line 38, in wrapper fn(*args) File "/usr/lib/python2.7/site-packages/circ2/align.py", line 85, in align tophat_fusion_parse(fusion_bam_f, out) File "/usr/lib/python2.7/site-packages/circ2/parse.py", line 63, in tophat_fusion_parse for i, read in enumerate(parse_fusion_bam(fusion, pair_flag)): File "/usr/lib/python2.7/site-packages/circ2/parser.py", line 41, in parse_fusion_bam chr1, chr2 = read.get_tag('XF').split()[1].split('-') ValueError: too many values to unpack

I get this error running CIRCexplorer2 parse on manually aligned data and when I run the above command when it tries to parse I get the same error. I can try running CIRCexplorer2 parse on the CIRCexplorer2 aligned data and still get the same error. Do you have any ideas what might be causing this?

kepbod commented 5 years ago

Are there any hyphen signs "-" in the chromosome name of the rat reference genome you used?

smahaffey commented 5 years ago

Thank you! Actually yes there are some spike-ins that we just add on to the end of the fasta file that aren't at all relevant for this. I should just remove those and try again right?

kepbod commented 5 years ago

CIRCexplorer2 assumed that there is no hyphen sign in the chromosome name. If the spike-ins are not related to circRNA analysis, you could remove them in the fasta file and rebuild index. If you want to retain them, you could try to change all the - to _ in the chromosome name and rebuild index.

smahaffey commented 5 years ago

Thank you so much. I just removed those spike-ins and that fixed the problem.

YangLab / CIRCexplorer2

Parsing error #30