biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

no records use the `kegg` key #90

Closed andrewsu closed 3 years ago

andrewsu commented 3 years ago

In the "available fields" table https://docs.mygene.info/en/latest/doc/data.html#available-fields, we list kegg as one top-level key. However, it doesn't look like any gene records actually include information using that key (http://mygene.info/v3/query?q=_exists_:kegg&species=all returns no results). Perhaps that is an older key that is no longer used (in which case we should remove it from the documentation)? Or perhaps it is a parser that has broken?

Incidentally, I'm not exactly sure what would have been stated under the top-level kegg key. Pathway information is found under pathway.kegg, and it seems like Kegg uses Entrez Gene ID as its primary gene ID (e.g., https://www.genome.jp/dbget-bin/www_bget?hsa:1017).

newgene commented 3 years ago

top-level kegg is a copy_to field (a virtual field used for indexing but does not really appear in the gene object). This is for the query convenience, so that we can query ?q=kegg:jak-stat as a shorthand for ?q=pathway.kegg.name:jak-stat.

This field is defined here:

https://github.com/biothings/mygene.info/blob/92e5622c3dcabc3c876319b27410530b0c747720/src/hub/dataload/sources/cpdb/upload.py

note that this query, ?q=kegg:jak-stat, does not work based on the current mapping, requires a quick fix in this line:

https://github.com/biothings/mygene.info/blob/92e5622c3dcabc3c876319b27410530b0c747720/src/hub/dataload/sources/cpdb/upload.py#L34

newgene commented 3 years ago

fixed in 81f471be50554dfcd780cae7226132606941331a