bioinfo-biols / CIRIquant

circular RNA quantification tools
https://sourceforge.net/projects/ciri/files/CIRIquant
MIT License
27 stars 17 forks source link

About CircAtlas sequence length #33

Open andre-gabriel-42 opened 2 years ago

andre-gabriel-42 commented 2 years ago

When using CIRIquant, we are recommended to use CircAtlas to obtain files like GTF files, reference genomes and whatnot. Are the sequences of circRNAs present in CircAtlas total or partial? How is it possible that some sequences have no more than 40 nucleotides? Are these sequences reliable? Are they really from real circRNAs? And how is it possible that there are no circRNA sequences with more than what 2k (2000) nucleotides? Have a nice week

Kevinzjy commented 2 years ago

Hi @andre-gabriel-42 ,

  1. The circRNA sequences in circAtlas are predicted by CIRI-AS and CIRI-full, and the full-length attribute will tell you whether the sequence is full-length or partial.

image

  1. CircAtlas only includes Illumina RNA-seq data, so it's impossible to get the full-length sequence of circRNAs longer than 2k. However, for circRNAs that are smaller than 40 bases, it's also hard to tell whether these circRNAs are real circRNAs or false positive prediction results. So you probably need to do some experimental validation for these circRNAs.
andre-gabriel-42 commented 2 years ago

Okay... After using CIRIquant, which circRNA FASTA file would you recommend to study my differentially expressed circRNAs? Would you recommend me to use the sequences found in CircAtlas or would you recommend me to use an other database?

Kevinzjy commented 2 years ago

Well, there is no perfect way to get the full-length sequence of all circRNAs easily. So I believe that you could use circAtlas for your downstream analysis, just keep in mind that some circRNAs could be partially reconstructed.

andre-gabriel-42 commented 2 years ago

Thank you for your help.