Closed komalsrathi closed 4 years ago
Thanks @komalsrathi
Posting the relevant gene lists (just to have on record) for generation of two separate plots per sample in their clinical reports:
1) Plot of pediatric high-grade glioma relevant genes with a CNV frequency rate of at least 5% in pediatric HGG that I took from pedcbioportal studies (AACR Project GENIE, PBTA, Herby Clinical Trial, ICR London, and CBTTC Provisional).
2) Plot of genes from the cancer gene census previously identified to be amplified or deleted in cancer, removing any intersecting genes with 1
.
@aadamk I am almost done with the code (generating copy number and expression matrices) but stuck where I have to map WGS samples to RNA-seq samples for PBTA dataset
. Because in order to cluster the copy number heatmap using the expression heatmap clustering, I would need a common (and unique id) to map copy number and expression. The problem is that same patients have multiple copy number and rnaseq samples:
For e.g. for sample id 7316-1746
, there are 3 RNA-seq samples and 2 WGS samples. These five samples have different Kids_First_Biospecimen_ID
but same sample_id
and Kids_First_Participant_ID
.
I would need some help here. cc: @yuankunzhu
There are patient ids where there is just one sample id corresponding to WGS and RNA-seq so I am starting with those first, and then add the other samples once I can get a mapping.
Closing as @komalsrathi completed CNV + expression heatmap.
@aadamk: Opening this as a new issue here so we can track both separately:
Also @komalsrathi, given the 4/10 deadline for the PNOC-0008 clinical reports, I'm attaching R code for generating a clustered expression heatmap. Some of the code was specific to the structure of a dataset I was working with in the past and thus some lines may not be relevant to the input files that you use.
exp_heatmap_annotated.zip
Otherwise, this code provides a framework for selecting a subset of genes based on an input character vector (
genelist
variable), re-scaling expression data to z-scores, generating a top annotation for the heatmap, and generating a clustered output.Can you add on to the expression plot the following properties? 1) A gene-specific CNV heatmap according to the
genelist
of interest to go along with the expression plot (e.g. like in panel c of this image?), and cluster by expression?2) Leverage OpenPBTA in this plot to contextualize the samples like we discussed?
3) For a given clinical report, can you highlight the PNOC sample ID of interest within the heatmap (e.g. for sample 10 for instance, highlight or make the label for PNOC-0008-10 a different color for its report to highlight where it falls amongst other samples)?
4) Include a top annotation of clinical variables (specifically race, gender, disease subtype)? I can point you to a clinical file if you need.
Originally posted by @aadamk in https://github.com/d3b-center/OMPARE/issues/3#issuecomment-605144641