dieterich-lab / DCC

DCC uses output from the STAR read mapper to systematically detect back-splice junctions in next-generation sequencing data. DCC applies a series of filters and integrates data across replicate sets to arrive at a precise list of circRNA candidates.
https://dieterichlab.org/software/
GNU General Public License v3.0
36 stars 20 forks source link

extract the circRNA sequences using the script getcircfasta #40

Closed tigerxu closed 7 years ago

tigerxu commented 7 years ago

Hi,

I met a problem when using the script getcircfasta and the DCC output CircCoordinates to extract the circRNA sequences for further detection of miRNA binding sites on these sequences. The number of circRNAs detected in the DCC output CircCoordinates is not equal to the number of the circRNA sequences generated by the script getcircfasta. The command lines I used is listed below.

$ python ~/install/DCC-0.4.4/scripts/getcircfasta -f /data5/haozhang/install/refGenome/mouse/GRCm38.primary_assembly.genome.fa -c /data5/haozhang/data/JEVmouseBrain-multiOmics/circRNA/DCC/detect_circRNA/CircCoordinates -e /data5/haozhang/install/refGenome/mouse/gencode/gencode.vM14.annotation.bed -o circRNA-sequence.fa &

$ wc -l CircCoordinates 3181 CircCoordinates

$ grep -c '>' circRNA-sequence.fa 899

Any suggestion is very appreciated! Thanks!

Zhuofei

tjakobi commented 7 years ago

Thank you for your bug report @tigerxu. I'll look into it.

tjakobi commented 7 years ago

Just to catch the most simple explanation, what happens if you increase the -m option to, let's say 50000? The default is only 5000.

tigerxu commented 7 years ago

Many thnaks for your advice! It works when -m option set to 50000. More circRNA sequences are produced.