Closed stephenshank closed 4 years ago
See #35.
I agree this is an important step, but I think we will want to do it slightly differently than in #35. I think we should do the processing in 2.TCGA-process.ipynb
where the mutation data starts out in a melted format. I also think we may want to add some additional columns like mutation severity which will be useful for the frontend in the future.
Until we sort these things out, can you use the workaround here for https://github.com/cognoma/core-service/pull/42 (which is a super high priority PR, so let's complete that ASAP):
path = 'mutation-matrix.tsv.bz2'
read_file = bz2.open(path , 'rt')
reader = csv.DictReader(read_file, delimiter='\t')
for row in reader:
sample_id = row.pop('sample_id')
for entrez_gene_id, mutation_status in row.items():
if mutation_status == '1':
# Create mutation from entrez_gene_id, sample_id
reader.close()
bz2
module for the win! So simple this way!!
The current format of the mutation matrix leads to some complications in the
core-service
repository. A more desirable format to work with for the purpose of populating thecore-service
mutation model would be of the form: