dieterich-lab / DCC

DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.
https://dieterichlab.org/software/
GNU General Public License v3.0
36 stars 20 forks source link

DCC very high amount of FP #99

Closed BarryDigby closed 2 years ago

BarryDigby commented 2 years ago

Not a bug, but I feel compelled to check my methods with you before publishing my results.

Results of a benchmark study are below:

Screenshot from 2021-10-11 10-08-12

The proportion of common circRNAs called by each tool represeted as a heatmap:

comm_prop_no_filt


These results are with no BSJ read filtering - which any sensible user should apply to their results. My paper is using these figures/tables to stress the importance of BSJ filtering, DCC performs far better when BSJ > 2 is applied, but I must say the rate of FP is startling compared to other tools.

I wanted to reach out and double-check my methods using DCC: [ Proc name] [ input file]

  1. STAR 1st Pass (PE data)
  2. SJDB File Generation (SJout.tab)
  3. STAR 2nd Pass (PE reads, SDJB files)

code available in these process blocks: https://github.com/nf-core/circrna/blob/2a5987b0e57a6bbe51bfd2bdbd2413bbe6a0431e/main.nf#L853-L992

  1. STAR 2nd Pass (Mate 1, SJDB File)
  2. STAR 2nd Pass (Mate 2, SJDB File)
  3. DCC (Outputs from 4. & 5.)

code available in these proc blocks: https://github.com/nf-core/circrna/blob/2a5987b0e57a6bbe51bfd2bdbd2413bbe6a0431e/main.nf#L1062-L1230

STAR parameters, IIRC, are default parameters from the documentation: https://github.com/nf-core/circrna/blob/2a5987b0e57a6bbe51bfd2bdbd2413bbe6a0431e/nextflow.config#L51-L73


DCC version 0.5.0, do not have the logs handy I am afraid.

Sim data generation: https://github.com/BarryDigby/circRNA_simu

tjakobi commented 2 years ago

Hi,

The documentation (https://docs.circ.tools/en/latest/Detect.html#running-circtools-circrna-detection) recommends 5 reads, the internal default parameter of DCC uses at least a BSJ count of 2. We explicitly do not recommend running DCC without BSJ filter, because we are aware of the importance of BSJ filtering.

If there is no specific scientific question that requires this low threshold, users should stay with the default parameters for BSJ count.

Thus, for the comparison I'd suggest to run DCC with default BSJ parameters and not with disabled BSJ filtering.

Cheers, Tobias