RNAcentral / rnacentral-webcode

RNAcentral website source code
https://rnacentral.org
Apache License 2.0
31 stars 8 forks source link

Which RNA type I should considered as long non-coding RNAs #562

Open beginner984 opened 2 years ago

beginner984 commented 2 years ago

Indeed, this is not an issue with RNAcentral, rather I need help with some intuition please

We have exosome-sequensing (from plasma). In raw read counts file, I see 72650 gene names

This is hoe my read count file looks like

Screenshot 2021-09-29 at 22 09 00

I have created a percentage bar chart for categories of RNAs annotated in this exosome-seq like

Picture 2

Which category (RNA type) I should consider as long non-coding RNA (lncRNA) ?

Can I consider this observed24% Long intergenic non-coding RNA (lincRNA) (sense+antisense) as long non-coding RNA (lncRNA) ?

But as I read Generally speaking we don’t expect much lncRNA/mRNA in plasma and much of that will be heavily fragmented which makes it very difficult to sequence. So how I see 24% of lincRNAs ?

If this was your data, which type of RNAs here you would considered as long non-coding RNA (lncRNA) ?

In RNAcentral, I see this

Screenshot 2021-09-29 at 22 22 03 Screenshot 2021-09-29 at 22 23 17

In Rfam part I could not find any lncRNAs

Am I right in searching?

Thanks for any intuition

AntonPetrov commented 2 years ago

@beginner984 Thank you for your question!

Searching for lncRNAs in RNAcentral is indeed not straightforward. My colleague @blakesweeney might be able to provide a more specific advice, but in general I would treat lncRNA and lincRNA as the same class to be on the safe side, as some lncRNAs could be incorrectly classified as lincRNAs and vice versa. I would also suggest not to use Rfam sequences if you are interested in lncRNAs, as Rfam does not focus on lncRNAs.

With respect to your question about why you observe such a high % of lncRNAs in your sample, that's difficult to answer without having more information, and the RNAcentral team cannot provide input on specific research projects. I would suggest spot-checking some of these lncRNA entries and see if you notice any pattern. It could be a misannotation, and those sequences are not actually lncRNAs, or it could be that your short sequences happen to overlap these lncRNAs by chance.

I hope this helps!