biothings / mygene.info

MyGene.info: A BioThings API for gene annotations
http://mygene.info
Other
113 stars 20 forks source link

Create a new NCBI data source to get complete gene summary from ASN dump #130

Open newgene opened 1 year ago

newgene commented 1 year ago

The current gene summary data (summary field) from MyGene.info API are extracted from the RefSeq records (see the current refseq data source).

It appears that Refseq does not contain all gene summary text available from NCBI. For example, reported in #129, gene POLA2 contains a summary text which is not available from its RefSeq record, therefore it's missing from the current MyGene.info API.

As suggested by the NCBI support team (Case #: CAS-941135-X3W9H8 for the record), the complete gene summary text are available from NCBI's ASN1 binary dump files. We can create a new ncbi_gene data source based on ASN1 binary dump files to extract gene summary text.