dieterich-lab / DCC

DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.
https://dieterichlab.org/software/
GNU General Public License v3.0
36 stars 20 forks source link

Fail at 'Filtering by read counts' #63

Closed LGray95 closed 5 years ago

LGray95 commented 5 years ago

Hi Tobias,

I am attempting to run DCC on my data but the program does not pass the 'Filtering by read counts' step. I have the most recent version from GitHub installed.

Here is the log file from my most recent attempt.

2019-05-24 17:12:22,545 DCC 0.4.7 started 2019-05-24 17:12:22,545 DCC command line: /srv/scratch/janitz/tools/DCC/DCC/main.py @/srv/scratch/z5199519/pbs_scripts/DCC/samplesheet -mt1 @/srv/scratch/z5199519/pbs_scripts/DCC/mate1 -mt2 @/srv/scratch/z5199519/pbs_scripts/DCC/mate2 -D -R /srv/scratch/janitz/tools/DCC//hg19_repeats.gtf -an /srv/scratch/janitz/genome_files/hg19/Annotation/Genes/genes.gtf -Pi -F -M -Nr 5 6 -fg -G -A /srv/scratch/janitz/genome_files/hg19/Sequence/WholeGenomeFasta/genome.fa 2019-05-24 17:12:22,574 Starting to detect circRNAs 2019-05-24 17:12:22,574 Stranded data mode 2019-05-24 17:12:22,574 Please make sure that the read pairs have been mapped both, combined and on a per mate basis 2019-05-24 17:12:22,574 Collecting chimera information from mates-separate mapping 2019-05-24 17:43:26,836 started circRNA detection from file _tmp_DCC/controlChimeric.out.junction.VKD59O 2019-05-24 17:43:26,836 started circRNA detection from file _tmp_DCC/enrichedChimeric.out.junction.4TAWQ5 2019-05-24 17:44:44,919 Read 38314873.-.38315053.SRR445016.145270886 has more than 2 count. 2019-05-24 17:44:44,939 Read 38314873.-.38315053.SRR445016.145270886 has more than 2 count. 2019-05-24 17:50:55,383 finished circRNA detection from file _tmp_DCC/controlChimeric.out.junction.VKD59O 2019-05-24 20:23:24,732 Read 38314873.-.38315053.SRR445016.145270886 has more than 2 count. 2019-05-24 20:26:19,772 finished circRNA detection from file _tmp_DCC/enrichedChimeric.out.junction.4TAWQ5 2019-05-24 20:26:19,773 Combining individual circRNA read counts 2019-05-24 20:26:34,983 Write in annotation 2019-05-24 20:26:34,983 Select gene features in Annotation file 2019-05-24 20:28:04,548 Filtering started 2019-05-24 20:28:04,549 Using files _tmp_DCC/tmp_circCount and _tmp_DCC/tmp_coordinates for filtering 2019-05-24 20:28:08,399 Filtering by read counts

Just to make sure, to produce the repeats.gtf I simply downloaded the two .gtf files from UCSC and combined them into a new file with cat.

I am also running this on a Linux HPC

Thanks for your help!

Lachlan

tjakobi commented 5 years ago

Dear @LGray95,

thank your for reporting your issues.

just to be sure, did you try running with lower filter criteria, e.g.. -Nr 2 2 to make sure this problem is not because of too low counts?

You could also disable the filtering by repeat file to make sure nothing is lost throughout that step.

Cheers, Tobias

LGray95 commented 5 years ago

Hi Tobias,

Thank you for your reply. I should have properly understood the default settings and changed them with respect to my experiment.

After adjusting for at least two counts in two samples I now have the required output files.

Cheers,

Lachlan Gray