biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

Consistently order data in lists #12

Closed stuppie closed 6 years ago

stuppie commented 7 years ago

There are many (Entrez) genes which match multiple genes in Ensembl. The data for each ensembl field is a list and the order of the values in each field is not guaranteed to be the same. This makes it impossible to determine which values belong to which gene.

For example: this gene. There are two ensembl genes linked: YBR181C and YPL090C (in that order). The genomic positions (chromosomes) are (in order): XVI and II. However: YBR181C is on chromosome II, and YPL090C is on chromosome XVI. The order is not correct, and is not guaranteed to be consistent. This makes it impossible to match the ensembl ID with the correct gene's genomic position.

stuppie commented 7 years ago

Just another use case for this, there are many genes like this where there are many genomic positions and ensembl entries, however only one of them is the "main" ensembl identifier (ENSG00000100197). It is the main identifier because it is the only one located on a chromosome, whereas the other are on haplotype chromosomes, also this is the ID that is used by other databases to xref to ensembl (example uniprot). But we can't determine which ensembl ID to use from the mygene response. It would be great to fix this!

sirloon commented 6 years ago

fixed as of 65abcb658802b1685c3f3ba44e565874099029f6 added "ensemblgene" under genomic_pos so genomic positions can be mapped/associated to data under "ensembl" key