Closed jonrkarr closed 4 years ago
(Note for me) Here's the frontend URL that is affected by this issue: http://localhost:3000/search/trna%20ala/
Yes. I need to fix the tokenization https://www.elastic.co/guide/en/elasticsearch/reference/current/analysis-tokenizers.html. This is related to https://github.com/KarrLab/datanator_rest_api/issues/127#issuecomment-706275800
Note to self:
tRNAs are now aggregated using orthodb_id.keyword
but not deployed yet (https://www.elastic.co/guide/en/elasticsearch/reference/current/multi-fields.html).
Need to see why some rRNAs are not searchable, more than likely caused by tokenization options because RNA28SN
, which is also an rRNA entry, can be found using https://testapi.datanator.info/ftx/text_search/gene_ranked_by_ko/?query_message=RNA28SN&from_=0&size=10&fields=orthodb_id.
I thought the issue stems from how Elasticsearch's standard tokenizer deals with .
in string was different from the tokenizer for fieldtype
text
. After a few hours tinkering with analyzers and tokenizers and such, I realized somehow the record with orthodb_id
just wasn't transferred to Elasticsearch, because https://testapi.datanator.info/ftx/text_search/gene_ranked_by_ko/?query_message=LSU5.8S&from_=0&size=10&fields=orthodb_id&fields=definition returns the proper result.
https://testapi.datanator.info/ftx/text_search/gene_ranked_by_ko/?query_message=LSU5.8S&from_=0&size=10&fields=orthodb_id&fields=_id now works.
Looks good. Thanks for persisting!
In the example below,
tRNA-Ala
appears twice as two hits in the search results. This should be aggregated together so that users don't see the same result repeated twice in the search results. https://testapi.datanator.info/ftx/text_search/gene_ranked_by_ko/?query_message=trna%20ala&from_=0&size=10&fields=definitionHere's the full endpoint that the frontend is currently using. https://testapi.datanator.info/ftx/text_search/gene_ranked_by_ko/?query_message=trna%20ala&from_=0&size=10&fields=orthodb_id&fields=orthodb_name&fields=gene_name&fields=gene_name_alt&fields=gene_name_orf&fields=gene_name_oln&fields=entrez_id&fields=protein_name&fields=entry_name&fields=uniprot_id&fields=definition&fields=ec_number