biothings / mydisease.info

7 stars 8 forks source link

Question about the _score field #47

Closed RemyLau closed 2 years ago

RemyLau commented 2 years ago

Hi, thanks for providing such a great and helpful tool!

I want to learn more about the _score field that is often returned for each query. The only description I could find on the documentation is that it represents how well the query matches the returned results. Can someone provide me a little more information regarding that? Perhaps pointing me to a few relevant source code files?

Thank you for any help in advance!

ravila4 commented 2 years ago

Hi Remy, the _score field is a default response field in Elasticsearch that tells us how relevant the result is to the query. In our APIs, we sometimes customize the function_score parameter which determines how search results are ranked. For example, in MyGene.info, we assign higher weights to results from humans, mice, and rats, in that order. Here is the code where we configure this:

https://github.com/biothings/mygene.info/blob/master/src/web/pipeline/build.py#L42-L50

However, MyDisease.info does not currently have any customizations to function_score. Results are assigned scores using the default similarity function in Elasticsearch, which as of the latest version is calculated using the BM25 algorithm.

You can read more about Scoring Functions in Elasticsearch here: https://www.elastic.co/blog/found-function-scoring On the The BM25 Algorithm: https://www.elastic.co/blog/practical-bm25-part-2-the-bm25-algorithm-and-its-variables

RemyLau commented 2 years ago

Hi @ravila4, thank you for your timely reply and the information!! This is very helpful. I'll take a closer look at those resources.