arq5x / gemini

a lightweight db framework for exploring genetic variation.
http://gemini.readthedocs.org
MIT License
317 stars 119 forks source link

Fix, update and automate generation of detailed_gene and summary_gene tables #2 #914

Open pfpjs opened 5 years ago

pfpjs commented 5 years ago

My previous PR was getting Ensembl gene info from GRCh38 instead of GRCh37, this fixes that. The column transcript_tsl / transcript_status was changed to transcript_gencode_basic - more info here.

Updated generated detailed_gene_table_v95 and summary_gene_table_v95 can be found here: https://drive.google.com/drive/u/0/folders/1q5geCRd0EPqGJcJQ0j2D_V0IO7Z0Q3vq

Apologies for the oversight regarding GRCh37 vs. GRCh38, I'm always bugging people to check genome build versions, and now it was my turn to get burned!

Addresses #902 and #912.

gmteunisse commented 5 years ago

Great work! I just checked the updated gene tables on google drive, but the HGNC names seem to not have been updated yet, i.e. they do not match the table downloaded from HGNC in hgnc.query.pl. Using issue #902 as an example, CECR1 is now a previous symbol for ADA2 in the downloaded table, yet in summary_gene_table_v95 it is still the approved symbol.