KarrLab / datanator_rest_api

A OAS3 compliant REST API for the Datanator integrated database
MIT License
0 stars 3 forks source link

Handle rRNA and tRNA alongside OrthoDB groups #127

Closed jonrkarr closed 4 years ago

jonrkarr commented 4 years ago

What is an example of rna/modification/get_modifications_by_ko/?ko_number= that returns data?

lzy7071 commented 4 years ago

I assigned each record with a dummy orthodb_id. Since each record currently has a unique kegg orthology id, we could assume a unique dummy orthodb_id for each record. The orthodb_id in the modifications collection is for aggregation only, which is done by kegg_orthology_id currently. None of our protein records has the same kegg orthology id as the records in modifications, because, well, they are proteins or protein-encoding genes, which shouldn't be in the same group as tRNA or rRNA. Example: https://testapi.datanator.info/rna/modification/get_modifications_by_ko/?ko_number=dummy_id_1&_from=0&size=10&target_organism=Escherichia%20coli

jonrkarr commented 4 years ago

Creating ids for different classes of rRNA (by type 16S, 23S, ...) and tRNA (by amino acid: Ala, Arg, Cys, ...) is a good solution.

We need to tweak a couple of things

lzy7071 commented 4 years ago
lzy7071 commented 4 years ago

The suggestion made in the first point is implemented. I chose something very similar to the last bullet point of the listed possibilities.

jonrkarr commented 4 years ago

The last bullet (tRNA-Ala, ...) was my favorite. Thanks!

jonrkarr commented 4 years ago

Regarding search, I can add definition to the fields, but adding this doesn't result in a search hit for tRNA. I tried putting quotes around the search phrase ("trna ala" or "trna alanine") to see if that would yield a hit, but it didn't.

What id pattern did you choose for non-coding RNA?

Can you give me an example of a search which includes one or more hits for rRNA or tRNA?

lzy7071 commented 4 years ago

Regarding search, I can add definition to the fields, but adding this doesn't result in a search hit for tRNA. I tried putting quotes around the search phrase ("trna ala" or "trna alanine") to see if that would yield a hit, but it didn't.

What id pattern did you choose for non-coding RNA?

Can you give me an example of a search which includes one or more hits for rRNA or tRNA?

The pattern is tRNA-Ala. The reason it didn't show in the top results has to do with how Elasticsearch scores results. You can see the score Elasticsearch gave the entry tRNA-Ala in rna_modification here: https://testapi.datanator.info/ftx/text_search/gene_ranked_by_ko/?query_message=trna-ala&from_=0&size=10&fields=definition We can see the result with the link above is because only rna_modification index contains the fields definition.

jonrkarr commented 4 years ago

I see the the tRNA results now that I added definition to the search fields.

jonrkarr commented 4 years ago

I now see tRNA results in search results. I'm having trouble finding rRNA results. I tried the searches below, but couldn't find any results for your rRNA ortholog groups. Is there another field that I should add to the search?

jonrkarr commented 4 years ago

When searching for 16S and 23S, I do see results for individual genes, but not for your ortholog groups.

lzy7071 commented 4 years ago

Fixed. See https://github.com/KarrLab/datanator_rest_api/issues/130#issuecomment-706396923

jonrkarr commented 4 years ago

I see 5.8S in the search results now