RNAcentral / rnacentral-sequence-search

RNAcentral sequence search cloud infrastructure
https://rnacentral.org/sequence-search
Apache License 2.0
2 stars 1 forks source link

RNAcentral results #127

Open hamzaaitabbou96 opened 1 month ago

hamzaaitabbou96 commented 1 month ago

Hi, I have a question about RNAcentral results, for similar sequences I got a list of 959 results how can I select the most similar ones. And what is the meaning of these variables : "identity": 87.5, "query_coverage": 92.3076923076923, "target_coverage": 0.4903964037597058, "alignment_sequence": "GCAGAGAUGUACUACAAGAAGCGU", "species_priority": "d",

carlosribas commented 1 month ago

Hi @hamzaaitabbou96,

Thank you for your interest in RNAcentral.

You can sort results in different ways when using the sequence search on our site, such as by e-value, identity, query coverage, and target coverage. By default, the results are sorted by e-value. We also prioritise displaying results from certain species first, for example Homo sapiens (9606) and Mus musculus (10090).

In brief, here’s what these terms mean:

If you require any further information, feel free to contact me.

hamzaaitabbou96 commented 1 month ago

can you explain me what is e-value, how to calculate it and how to select the most similar one with e-value.

carlosribas commented 1 month ago

e-value is a parameter that describes the number of hits one can expect to see by chance when searching a database of a particular size.

Our sequence search is performed by the nhmmer and it is this software that calculates the e-value. This page explains how the search is performed.