Similar to how we have sample information in samples.tsv, it would be nice to create a table with gene information. The primary identifier is entrez_gene_id. Additional columns could be:
symbol
name
chromosome
n_mutations - number of mutated samples
median_expression - median gene expression
mad_expression - median absolute deviation of gene expression
I'm leaning towards a combined dataset for mutation and expression genes. But I could be convinced that splitting the datasets would be better.
We should probably get this information from entrez gene as @clairemcleod did in #12.
Similar to how we have sample information in
samples.tsv
, it would be nice to create a table with gene information. The primary identifier isentrez_gene_id
. Additional columns could be:symbol
name
chromosome
n_mutations
- number of mutated samplesmedian_expression
- median gene expressionmad_expression
- median absolute deviation of gene expressionI'm leaning towards a combined dataset for mutation and expression genes. But I could be convinced that splitting the datasets would be better.
We should probably get this information from entrez gene as @clairemcleod did in #12.
Labeling this issue a task awaiting a claimer.