Generate clustered expression heatmap

komalsrathi commented 4 years ago

@aadamk: Opening this as a new issue here so we can track both separately:

Also @komalsrathi, given the 4/10 deadline for the PNOC-0008 clinical reports, I'm attaching R code for generating a clustered expression heatmap. Some of the code was specific to the structure of a dataset I was working with in the past and thus some lines may not be relevant to the input files that you use.

exp_heatmap_annotated.zip

Otherwise, this code provides a framework for selecting a subset of genes based on an input character vector (genelist variable), re-scaling expression data to z-scores, generating a top annotation for the heatmap, and generating a clustered output.

Can you add on to the expression plot the following properties? 1) A gene-specific CNV heatmap according to the genelist of interest to go along with the expression plot (e.g. like in panel c of this image?), and cluster by expression?

2) Leverage OpenPBTA in this plot to contextualize the samples like we discussed?

3) For a given clinical report, can you highlight the PNOC sample ID of interest within the heatmap (e.g. for sample 10 for instance, highlight or make the label for PNOC-0008-10 a different color for its report to highlight where it falls amongst other samples)?

4) Include a top annotation of clinical variables (specifically race, gender, disease subtype)? I can point you to a clinical file if you need.

Originally posted by @aadamk in https://github.com/d3b-center/OMPARE/issues/3#issuecomment-605144641

aadamk commented 4 years ago

Thanks @komalsrathi Posting the relevant gene lists (just to have on record) for generation of two separate plots per sample in their clinical reports: 1) Plot of pediatric high-grade glioma relevant genes with a CNV frequency rate of at least 5% in pediatric HGG that I took from pedcbioportal studies (AACR Project GENIE, PBTA, Herby Clinical Trial, ICR London, and CBTTC Provisional). 2) Plot of genes from the cancer gene census previously identified to be amplified or deleted in cancer, removing any intersecting genes with 1.

2020-03-30_Glioma_GeneList.txt

komalsrathi commented 4 years ago

@aadamk I am almost done with the code (generating copy number and expression matrices) but stuck where I have to map WGS samples to RNA-seq samples for PBTA dataset. Because in order to cluster the copy number heatmap using the expression heatmap clustering, I would need a common (and unique id) to map copy number and expression. The problem is that same patients have multiple copy number and rnaseq samples: For e.g. for sample id 7316-1746, there are 3 RNA-seq samples and 2 WGS samples. These five samples have different Kids_First_Biospecimen_ID but same sample_id and Kids_First_Participant_ID.

I would need some help here. cc: @yuankunzhu

There are patient ids where there is just one sample id corresponding to WGS and RNA-seq so I am starting with those first, and then add the other samples once I can get a mapping.

aadamk commented 4 years ago

Closing as @komalsrathi completed CNV + expression heatmap.

d3b-center / OMPARE

Generate clustered expression heatmap #4