AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
100 stars 67 forks source link

release V22 #1365

Closed jharenza closed 2 years ago

jharenza commented 2 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

This updates the histologies file with MB WGS samples as "To be classified", which were previously missed

What was your approach?

  1. Updated https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/master/analyses/molecular-subtyping-MB/04-no-RNA-samples.R for subtypes to say MB, To be classified instead of To be classified
  2. Updated https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/46a9d5c0656742b79aa472eadbd78f8bdd720fe4/analyses/molecular-subtyping-pathology/pathology_free_text-subtyping-lgat.Rmd to recode LGG, subtype --> SEGA, subtype
  3. Created base-histologies.tsv from v21 and reran molecular-subtype-integrate to get pbta-histologies.tsv.

What GitHub issue does your pull request address?

1207

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Notes:

v21 %>%
  filter(short_histology == "LGAT") %>%
  select(cancer_group, experimental_strategy) %>%
  table()
                                     experimental_strategy
cancer_group                          RNA-Seq WGS
  Diffuse fibrillary astrocytoma            0   1
  Low-grade glioma astrocytoma            244 234
  Pilocytic astrocytoma                     1   2
  Pleomorphic xanthoastrocytoma             2   1
  Subependymal Giant Cell Astrocytoma       4   3

# v22
histology %>%
  filter(short_histology == "LGAT") %>%
  select(cancer_group, experimental_strategy) %>%
  table()
                                     experimental_strategy
cancer_group                          RNA-Seq WGS
  Diffuse fibrillary astrocytoma            6   6
  Gliomatosis cerebri                       1   1
  Low-grade glioma astrocytoma             94  89
  Oligodendroglioma                         1   1
  Pilocytic astrocytoma                   126 121
  Pleomorphic xanthoastrocytoma            11  11
  Subependymal Giant Cell Astrocytoma      12  12

The idea behind this separation into cancer groups before was to visualize the smaller groups within the oncoprint. The main takeaway, though, is that because there were a handful of pilocytic, and pleomorphic (pxa) not in the Low-grade glioma astrocytoma cancer_group _and there were some SEGA in the Low-grade glioma astrocytoma cancer_group, the analyses are not performed on the exact cohort of interest, so this is not an easy fix by simply recoding the v22 cancer_group back to v21. I also realized that Ganglioglioma is already its own cancer group and has a high enough N, so is in many plots already, but was missed the survival LGG_group.

I suppose my thoughts from all of this are that if we have to remake figures anyway, it probably makes sense to keep the cancer group code as it was added by @kgaonkar6, we may have to make a few more colors in the palette, and update survival to use the relevant cancer groups within LGG. 😭

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

no, but we need to discuss next steps

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

Documentation Checklist

jharenza commented 2 years ago

Ok, I perhaps need to rerun the LGAT subtyping module as well.

jharenza commented 2 years ago

@jaclyn-taroni I need some help. I tried rerunning molecular subyping for LGAT, but I am running into errors. First, in ce8dbd2, I am updating the 01 script. It would kill at the rbind step for consensus and hotspot mafs, so I reordered the code to pull LGAT samples out of these files upon reading so that they aren't so big. That worked, and 03 is now giving an error at chunk 7, when making the TxDb from GTF for FGFR1. I saw some perhaps related tickets suggesting this may be due to unstable RefSeq files? I am not sure what to do here.

jharenza commented 2 years ago

closing this and will start fresh once some of the code updates are merged.