cognoma / cancer-data

TCGA data acquisition and processing for Project Cognoma
Other
20 stars 28 forks source link

Outsource Entrez Gene logic to cognoma/genes #32

Closed dhimmel closed 7 years ago

dhimmel commented 7 years ago

0.genes-download.ipynb is a notebook to download datasets from cognoma/genes. Update 2.TCGA-process.ipynb to use the gene mapping guidelines in cognoma/genes#1. Remove mapping/PANCAN-mutation/ since this mapping is now done in 2.TCGA-process.ipynb.

Closes #23. Closes #30 by exporting gene info files in 2.TCGA-process.ipynb.

dhimmel commented 7 years ago

Will update repo and commit info in 0.genes-download.ipynb once cognoma/genes#1 is merged.

dhimmel commented 7 years ago

@cgreene could you take a look at the additional changes.

cgreene commented 7 years ago

Ok - reviewed commit d15f4c10c13fc67adb4352898acaafffd5200303 - do you want to check that the genes are correlated before averaging? They probably are... but I know the Troyanskaya Lab used to have some logic to only calculate the mean for genes that agreed after symbol mapping. I think Matt Hibbs first wrote it if I recall correctly.

cgreene commented 7 years ago

Reviewed commit eeba83d7604f45d1ae7c698ae61d44b7a867deaf and it LGTM :+1:

cgreene commented 7 years ago

My feelings are: full PR LGTM :+1: - one question. I don't think you need to address it now. You may want to consider it (maybe it's worth an issue)?

dhimmel commented 7 years ago

do you want to check that the genes are correlated before averaging?

I agree that correlation is a good sanity check here. Given that only 39 genes had multiple expression measurements (see this diff), I'm not too worried about any issues.

If anyone is worried just reply here or open and issue and I'll be happy to look into it further.