Closed jonrkarr closed 4 years ago
I assigned each record with a dummy orthodb_id
. Since each record currently has a unique kegg orthology id, we could assume a unique dummy orthodb_id
for each record. The orthodb_id
in the modifications collection is for aggregation only, which is done by kegg_orthology_id
currently. None of our protein records has the same kegg orthology id as the records in modifications, because, well, they are proteins or protein-encoding genes, which shouldn't be in the same group as tRNA or rRNA.
Example: https://testapi.datanator.info/rna/modification/get_modifications_by_ko/?ko_number=dummy_id_1&_from=0&size=10&target_organism=Escherichia%20coli
Creating ids for different classes of rRNA (by type 16S, 23S, ...) and tRNA (by amino acid: Ala, Arg, Cys, ...) is a good solution.
We need to tweak a couple of things
[x] We need to change these ids to something like trna_1
, rrna_1
, or other suggestions below that is more professional looking
http://localhost:3000/gene/dummy_id_1/
). ncrna_1
, ncrna_2
(lump all rRNA and tRNA together as nc (non-coding RNA))ncrna-1
, ncrna-2
trna_1
, rrna_1
, ...tRNA-1
, rRNA-1
, ...tRNA-Ala
, tRNA-Arg
, rRNA-16S
, ...[ ] It needs to be possible to get search results for rRNA and tRNA (which use these ids). The search results used to include rRNA and tRNA. This no longer appears to be the case. For example, searching for "tRNA trp"
yields no results. Example:
https://testapi.datanator.info/ftx/text_search/gene_ranked_by_ko/?query_message=%22trna%20trp%22&from_=0&size=10&fields=orthodb_id&fields=orthodb_name&fields=gene_name&fields=gene_name_alt&fields=gene_name_orf&fields=gene_name_oln&fields=entrez_id&fields=protein_name&fields=entry_name&fields=uniprot_id&fields=ec_number
The suggestion in the first point is great! I'll implement it.
To address the second point, https://testapi.datanator.info/ftx/text_search/gene_ranked_by_ko/?query_message=trna%20trp&from_=0&size=10&fields=protein_name&fields=synonyms&fields=enzymes&fields=orthodb_name&fields=gene_name&fields=name&fields=enzymes.enzyme.enzyme_name&fields=enzymes.subunit.canonical_sequence&fields=species&fields=definition seems to return the intended results. Perhaps it was the "
in the query message or the lack of definition
as a fields
.
The suggestion made in the first point is implemented. I chose something very similar to the last bullet point of the listed possibilities.
The last bullet (tRNA-Ala, ...) was my favorite. Thanks!
Regarding search, I can add definition
to the fields, but adding this doesn't result in a search hit for tRNA. I tried putting quotes around the search phrase ("trna ala"
or "trna alanine"
) to see if that would yield a hit, but it didn't.
What id pattern did you choose for non-coding RNA?
Can you give me an example of a search which includes one or more hits for rRNA or tRNA?
Regarding search, I can add
definition
to the fields, but adding this doesn't result in a search hit for tRNA. I tried putting quotes around the search phrase ("trna ala"
or"trna alanine"
) to see if that would yield a hit, but it didn't.What id pattern did you choose for non-coding RNA?
Can you give me an example of a search which includes one or more hits for rRNA or tRNA?
The pattern is tRNA-Ala
. The reason it didn't show in the top results has to do with how Elasticsearch scores results. You can see the score Elasticsearch gave the entry tRNA-Ala
in rna_modification
here: https://testapi.datanator.info/ftx/text_search/gene_ranked_by_ko/?query_message=trna-ala&from_=0&size=10&fields=definition
We can see the result with the link above is because only rna_modification
index contains the fields
definition
.
I see the the tRNA results now that I added definition
to the search fields.
I now see tRNA results in search results. I'm having trouble finding rRNA results. I tried the searches below, but couldn't find any results for your rRNA ortholog groups. Is there another field that I should add to the search?
"LSU4.5S"
LSU4.5S
LSU
large subunit
4.5S
When searching for 16S
and 23S
, I do see results for individual genes, but not for your ortholog groups.
I see 5.8S in the search results now
What is an example of
rna/modification/get_modifications_by_ko/?ko_number=
that returns data?