EddyRivasLab / infernal

RNA secondary structure/sequence profiles for homology search and alignment
Other
100 stars 24 forks source link

Infernal on transcriptomes #31

Closed sanyalab closed 2 years ago

sanyalab commented 2 years ago

Hi Eric,

I wanted to know if I can use infernal with Rfam to predict ncRNA in transcriptome assemblies. OR is it only for genomes.

Thanks Abhijit

nawrockie commented 2 years ago

@sanyalab : infernal and Rfam can be used with any type of sequence data, not just genomes. It will not work well for short reads however. This may be a useful reference for you, see Alternate protocol 1 on page 8. https://www.ncbi.nlm.nih.gov/pmc/articles/PMC6754622/

sanyalab commented 2 years ago

Awesome!!! Thank you Eric. Yes I am either using long-reads or a hybrid transcriptome assembly (LR + SR).

On a different topic, I read somewhere (sorry cannot find the reference) that Rfam is not great for predicting lncRNA, piRNA and miRNA. Has that changed since? I mean does the recent Rfam v14.7 have a decent representation of the three classes? I understand that there are exhaustive DL techniques developed to predict the three classes, But from an Infernal and Rfam standpoint how good is the performance for predicting the three classes of ncRNA?

Thanks Abhijit

ppgardne commented 2 years ago

Sorry to butt in @nawrockie.

I did try and build some piRNA models for Rfam a long time ago, the problem is that these are derived from repetitive sequences that have no structure. So the resulting models are horribly non-specific -- hitting every related repeat in the genome. We have Dfam for this.

There are many pre-miRNA models that are generally not too bad -- if you want to annotate mature miRNAs then you may want to use a more specific method for that.

There are a handful of lncRNA models in Rfam too, only for the really well characterised lncRNAs. These should generally be thought of as lncRNA domains, rather than full-length lncRNAs. Rfam doesn't annotate splicing. Also, too many lncRNAs have been proposed that are more likely to be transcriptional noise for Rfam to be more inclusive of these.

nawrockie commented 2 years ago

Thanks, @ppgardne ! @sanyalab : please see Paul's response - I defer to him as he ran Rfam for years.

sanyalab commented 2 years ago

Thank you Paul and Eric. That really helps me to design the ncRNA pipeline for genome annotation appropriately.

Best Wishes Abhijit