ViennaRNA / RNAz

RNAz - predicting structural noncoding RNAs
http://www.tbi.univie.ac.at/software/RNAz/
Other
14 stars 5 forks source link

rnazCluster.pl clusters windows with non-similar structures #12

Open halabikeren opened 2 years ago

halabikeren commented 2 years ago

Hi dear team,

I performed RNAz prediction on an alignment of viral genomes of a specific viral species. To this aim, I first divided the alignment to windows using rnazWindow.pl, then perform RNAz prediction, and then clustered significant structures of overlapping windows using rnazCluster. While I expected only similar structures of overlapping windows to be clustered, when I plot them with RNAplot, I see that they are, at least by visualization, quite different. Please find an attached example: rnazCluster_example.zip

Additional info: I am using: RNAz version 2.1 RNAplot 2.5.0

Am I doing something wrong or expecting rnazCluster.pl to do something it is not supposed to?

Many thanks!

svenderheld commented 2 years ago

Dear halabikeren,

here is part of the description you get when you call 'rnazCluster.pl --man'

"rnazCluster.pl" reads RNAz output files and combines hits in overlapping windows to ``loci". It prints a summary of the windows and/or loci as a tabulator delimited text to the standard output. An explanation of the fields can be found below. See the user manual for a more detailed meaning of these values.

Hence, it does not take care of any structural features and simply combines overlapping windows by coordinates to larger loci. If you want to get the overall structure of a locus you could run for instance RNAlifold using a respective alignment of the complete locus. Such an alignment is not part of the RNAz output and might need additional work. Please be aware that structure prediction becomes the inaccurate the longer the input is.

Hope that helps?!

Best, Sven

halabikeren commented 2 years ago

Thank you Sven!

In this case, if I want to obtain the set of secondary structures within a genome, would you recommend that I work directly with RNAlifold, or is there are more accurate alternative (e.g., use non-overlapping windows of varying sizes with RNAz and filter the results based on a strict RNAz functional structures class probability cutoff?)

Thanks again! Keren

svenderheld commented 1 year ago

Dear Keren,

if you (think to) know the boundaries of your transcript I recommend to use blat or blat to find homologs and build the full alignment based on that. Of course you could also use the boundaries of the RNAz loci During the processing of the RNAz framework sequences originally in the MAF alignment might be filtered and are not in the finally scored window. You might therefore also get a more complete picture. Depending on the sequences at hand the one or the other alignment approach might be superior to others. If you assume to look at an set of conserved structured RNAs you could use an sequence structure alignment approach, e.g. of the group in Freiburg [1], or you use for instance rcoffee [2].

I actually do not recommend to play around to much with the window sizes because RNAz was trained on alignments of size 120 (if I'm not mistaken) and to the best of my knowledge the effect of changing window size has been not tested so far.

Sorry for the delay but I hope it still helps.

Best, Sven

[1] https://rna.informatik.uni-freiburg.de/ [2] https://tcoffee.org/Projects/rcoffee/index.html