arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
318 stars 120 forks source link

Missing gene information #793

Open ishengtsai opened 8 years ago

ishengtsai commented 8 years ago

Hi,

I tried to retrieve additional information from gene_summary but some information of genes appear missing.

Working example:

gemini query -q "select synonym from gene_summary where gene='ARTEMIS'" db

Another example where gene is in variants but not in gene_summary gemini query -q "select synonym from gene_summary where gene='CTD-3193O13.9'" db

Hence some variants disappeared when joining the tables. Ideally if gene information is not present the variants should still show with NAs in the gene_summary field. How to get around with this possibility?

naumenko-sa commented 7 years ago

Hi! How did you load information to GEMINI.db?

First query works for me:

gemini query -q "select synonym from gene_summary where gene='ARTEMIS'" NA12878-1-ensemble.db 
A-SCID,FLJ11360,SNM1C,DCLRE1C,SCIDA
A-SCID,FLJ11360,SNM1C,DCLRE1C,SCIDA

For the second query are you sure that this gene exists in annotations? http://useast.ensembl.org/Human/Search/Results?q=CTD-3193O13.9;site=ensembl;facet_species=Human;page=1;perpage=10 The only one I see with the similar name in GEMINI.db is CTD-2021J15.2

I hope this helps. Sergey