Closed dhimmel closed 8 years ago
Here's the head of genes.tsv
, which would become part of the Cognoma data release:
entrez_gene_id | symbol | description | chromosome | gene_type | synonyms | aliases | n_mutations | mutation_frequency | mean_expression | mutation | expression | |||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | A1BG | alpha-1-B glycoprotein | 19 | protein-coding | A1B | ABG | GAB | HYST2477 | alpha-1B-glycoprotein | HEL-S-163pA | epididymis secretory sperm binding protein Li 163pA | 30 | 0.004106 | 6.71 | 1 | 1 | ||
2 | A2M | alpha-2-macroglobulin | 12 | protein-coding | A2MD | CPAMD5 | FWP007 | S863-7 | alpha-2-macroglobulin | C3 and PZP-like alpha-2-macroglobulin domain-containing protein 5 | alpha-2-M | 130 | 0.01779 | 13.34 | 1 | 1 | ||
3 | A2MP1 | alpha-2-macroglobulin pseudogene 1 | 12 | pseudo | A2MP | pregnancy-zone protein pseudogene | 4 | 0.0005475 | 1 | 0 | ||||||||
9 | NAT1 | N-acetyltransferase 1 | 8 | protein-coding | AAC1 | MNAT | NAT-1 | NATI | arylamine N-acetyltransferase 1 | N-acetyltransferase 1 (arylamine N-acetyltransferase) | N-acetyltransferase type 1 | arylamide acetylase 1 | monomorphic arylamine N-acetyltransferase | 17 | 0.002327 | 6.729 | 1 | 1 |
10 | NAT2 | N-acetyltransferase 2 | 8 | protein-coding | AAC2 | NAT-2 | PNAT | arylamine N-acetyltransferase 2 | N-acetyltransferase 2 (arylamine N-acetyltransferase) | N-acetyltransferase type 2 | arylamide acetylase 2 | 26 | 0.003559 | 2.086 | 1 | 1 |
Do not review yet --- will update in the wake of https://github.com/cognoma/genes/pull/1.
Closes https://github.com/cognoma/cancer-data/issues/23
This downloads the latest Entrez Gene information from their FTP site (updated daily). Obsoleted genes have missing values for the columns from Entrez Gene. Unclear how we want to proceed wrt making the backing/django-genes and cancer-data use the same gene data.