bioinfo-biols / CIRIquant

circular RNA quantification tools
MIT License
27 stars 18 forks source link

why miss so many circRNA to putout ? #18

Closed gnilihzeux closed 3 years ago

gnilihzeux commented 3 years ago

Dear author, I had input a bed containing 135 circRNAs from DCC, but got only 8 in final gtf, which were all in Chr1. In addition, there were 127 circRNAs in {sample_nm}_index.fa but only 47 in {sample_nm}_denovo.sorted.bam.

So, why there were so many gaps?

Thanks a lot.

Kevinzjy commented 3 years ago

That's weird, could you try running CIRIquant without --bed option, which will use embedded version of CIRI2 for circRNA prediction?

gnilihzeux commented 3 years ago

OK, I'll try & return later.

gnilihzeux commented 3 years ago

Hi, I got a same results with '--tool & --circ'.

My comands are list below.

By the way, '--circ' should be a bed-4 format but not mentioned in the manual, which you should have a update.

my yml

name: hg19
  bwa: /root/miniconda2/bin/bwa
  hisat2: /root/miniconda2/bin/hisat2
  stringtie: /root/miniconda2/bin/stringtie
  samtools: /root/miniconda2/bin/samtools

  fasta: /root/Database/ref_genome/hg19/gencode_grch37p13.fa
  gtf: /root/Database/ref_genome/hg19/gencode_grch37p13.gtf
  bwa_index: /root/Database/ref_genome/hg19/bwa_index/gencode_grch37p13.fa
  hisat_index: /root/Database/ref_genome/hg19/hisat2_index/gencode_grch37p13

1st run

docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
  /root/miniconda2/bin/CIRIquant \
            -t 8 \
            -1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
            -2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
            --config ${PRJ_DIR}/ciriquant.hg19.yml \
            -o ${CIRIQ_DIR} \
            -p ${sm_nm} \
            -l 2 \
            --bed ${CIRIQ_DIR}/circ.bed \
            --log ${CIRIQ_DIR}/${sm_nm}.log

2nd run

docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
  /root/miniconda2/bin/CIRIquant \
            -t 16 \
            -1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
            -2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
            --config ${PRJ_DIR}/ciriquant.hg19.yml \
            -o ${CIRIQ_DIR} \
            -p ${sm_nm} \
            -l 2 \
            --circ ${CIRIQ_DIR}/circ.bed \
            --tool DCC \
            --bam ${CIRIQ_DIR}/align/${sm_nm}.sorted.bam \
            --log ${CIRIQ_DIR}/${sm_nm}.log
Kevinzjy commented 3 years ago

Hi, could you run CIRIquant using the embedded CIRI2 rather than DCC for circRNA identification? For example:

docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
  /root/miniconda2/bin/CIRIquant \
            -t 16 \
            -1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
            -2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
            --config ${PRJ_DIR}/ciriquant.hg19.yml \
            -o ${CIRIQ_DIR} \
            -p ${sm_nm} \
            -l 2 \
            --log ${CIRIQ_DIR}/${sm_nm}.log

By the way, I noticed that you are using stranded library where read1 match the antisense strand of circRNAs. I've only tested CIRIquant on ScriptSeq data, which are using a different stranded protocol. You might want to run CIRIquant with -l 0 to check whether the strand determination of circRNAs is causing the problem.

gnilihzeux commented 3 years ago

Ehhhhhh, there were only 3 circRNAs left using CIRIquant, which are still on Chr1. But a few of circRNAs were found by CIRI2, which were 47 among multiple chromosomes.

3nd run

docker run -v /root:/root --name ciriq_${sm_nm} ciriquant:v1.1 \
  /root/miniconda2/bin/CIRIquant \
            -t 16 \
            -1 ${TRIM_DIR}/${sm_nm}_trim_R1.fq.gz \
            -2 ${TRIM_DIR}/${sm_nm}_trim_R2.fq.gz \
            --config ${PRJ_DIR}/ciriquant.hg19.yml \
            -o ${CIRIQ_DIR} \
            -p ${sm_nm} \
            -l 2 \
            --log ${CIRIQ_DIR}/${sm_nm}.log

My files tree

├── align
│   ├── Lung-1.sorted.bam
│   └── Lung-1.sorted.bam.bai
├── circ
│   ├── Lung-1.ciri
│   ├── Lung-1_denovo.sorted.bam
│   ├── Lung-1_denovo.sorted.bam.bai
│   ├── Lung-1_index.1.ht2
│   ├── Lung-1_index.2.ht2
│   ├── Lung-1_index.3.ht2
│   ├── Lung-1_index.4.ht2
│   ├── Lung-1_index.5.ht2
│   ├── Lung-1_index.6.ht2
│   ├── Lung-1_index.7.ht2
│   ├── Lung-1_index.8.ht2
│   └── Lung-1_index.fa
├── gene
│   ├── Lung-1_cov.gtf
│   ├── Lung-1_genes.list
│   └── Lung-1_out.gtf
├── Lung-1.bed
├── Lung-1.gtf
└── Lung-1.log
Kevinzjy commented 3 years ago

Well, then I have to presume that your data is not suitable for circRNA analysis. Are you using stranded libraries that were constructed using oligo-dT primers? If so, most circRNAs will be filtered out, and it will explain the results.

P.S. You might also want to check the expression levels of circRNAs reported by DCC. Some lowly expressed circRNAs might be reverse transcription artifacts.

gnilihzeux commented 3 years ago

Well, I'll try other samples.

gnilihzeux commented 3 years ago

I've test two samples, one with RNaseR treated & another is just in regular RNA-seq, in which circRNAs are among different chromosomes and some has considerable reads qualificated by DCC.

But CIRIquant returns only a few circrnas on Chr1 for both samples.

I think there are some thresholds in your programs, such as

By the way, we could have a communication further using wechat if it is necessary.