bioinfo-biols / CIRI-long

Circular RNA Identification for Nanopore Sequencing
https://ciri-cookbook.readthedocs.io
MIT License
17 stars 5 forks source link

Question about pipeline #15

Closed mmaitenat closed 2 years ago

mmaitenat commented 2 years ago

Hi there,

We are using CIRI-long in some data we've generated. We've read through the article and found that the algorithm deals with the possibility that naturally occurring tandem repeats are confused with genuine circRNAs. However, we were wondering whether you recommend any further effort to mitigate this risk, such as using RepeatMasker or filtering the final circRNA list using any tandem repeat annotation file, or instead you consider that's unnecessary.

Thank you very much for your help, and this tool!

Kevinzjy commented 2 years ago

Hi @mmaitenat ,

CIRI-long requires a read to be fully repetitive, that is, the start and end position of the repetitive region is located within the first and last 100bp of raw read. So it could filter out most short tandem repeats inside the read. However, if a read is entirely from random repeat sequences, it might cause some trouble.

So I would suggest you only look for circRNAs with strong "GT/AG" evidence, which should reduce the number of falsely identified circRNAs from TR regions. A further filter using repeat annotation also makes sense.

mmaitenat commented 2 years ago

Thanks a lot!

Maitena.