dieterich-lab / DCC

DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.
https://dieterichlab.org/software/
GNU General Public License v3.0
36 stars 20 forks source link

gene name annotations not included in final results #46

Closed fklirono closed 6 years ago

fklirono commented 6 years ago

DCC version 0.4.4 GENCODE v27 annotation for GRCh38 STAR 2.5.3a indexed GRCh38 with GENCODE v27 annotation

I run DCC after mapping the stranded paired-end ribodepleted RNAseq data in the three passes described in the manual (both mates, mate1, mate2):

DCC both_Chimeric.out.junction -mt1 mate1_Chimeric.out.junction -mt2 mate2_Chimeric.out.junction -O ./dcc -t ./dcc/_tmp -D -R /data/annotations/GRCh38/rpmk+simple_repeats.gtf -an /data/annotations/GRCh38/GRCh38.gencode.v27.gtf -k -T 16 -Pi -F -M -fg -Nr 1 1 -G -A /data/genomes/GRCh38/GRCh38.fa -B ./both_Aligned.sortedByCoord.out.bam

However, in the final result file CircCoordinates the circRNAs detected are not annotated (there is a dot (.) in the Gene column), whereas the temporary file tmp_coordinates_annotated has all detected and unfiltered circRNAs correctly annotated. It seems that somehow this annotation is not transferred to the filtered and final results?

fklirono commented 6 years ago

I think I have found the bug! Checking out v0.4.4 I see on main.py you have commented out lines 274-275 which rename tmp_coordinates_annotated to tmp_coordinates so that it can be picked up by the Filtering module.

When I run DCC without Filtering activated then the circRNAs are correctly annotated.

Reading over your code I discovered another bug. Your "hidden" -ss option defaults to False, whereas your -N option defaults to True although you indent DCC to run by default in stranded mode. So effectively it runs by default in unstranded mode. EDIT: this is not correct, the -ss flag defines fr-firststrand synthesis for stranded data and the -N flag sets options.strand to FALSE.

tjakobi commented 6 years ago

Hi @fklirono,

thank you for your report. It probably makes sense to release a new version soon due to the number of fixes not included in the current stable release.

However, from your reply I assume that using the most current master branch does fix the error for you?

Cheers, Tobias

fklirono commented 6 years ago

Hi,

yes, DCC seems to work properly now. I will freeze it in my pipeline for now.

Thanks for the good work!