AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
100 stars 67 forks source link

Pre-V22 PR summary #1389

Closed jharenza closed 2 years ago

jharenza commented 2 years ago
PR type summary
#1376 - update/rerun MB subtyping (1 of N) code update to recode "To be classified" --> "MB, To be classified" 7 samples' subtypes recoded
NA Rerun CRANIO subtyping NA, not even HTML changes, so couldn't PR
#1377 - Rerun EPN subtyping (2 of N) EPN module rerun NA, only decimals from z-scoring changes
#1378 - rerun embryonal (3 of N) embryonal module rerun NA, only HTML changes
#1379 - rerun chordoma subtyping (4 of N) chordoma module rerun only HTML/jitter plot changes
#1380 - Rerun ews (5 of N) EWS module rerun NA, no changes initially, even HTML, so updated the RMD slightly to get it to run and show a change to PR
#1381 - Rerun neurocytoma (6 of N) neurocytoma module rerun NA, only HTML changes
#1382 - Rerun hgg subtyping (7 of N) HGG module rerun 7 samples removed - this is due to these RNA samples not being in the TP53 classifier scores file for the join
#1384 - Rerun LGG subtyping (8 of N) LGG module rerun NA, only HTML changes
#1385 - Rerun molecular subtyping pathology (9 of N) pathology module rerun The changes are expected based on changes seen in MB rerun and HGG rerun, though the HGG changes should not occur. Those are happening due to the samples not being in the classifier score output (#1383). We did not see expected LGG pathology free text cancer group/integrated dx being propagated.
#1386 - Rerun molecular subtype integrate (10 of N) integrate module rerun We see the MB samples being updated from #1376, so this is expected. We also see the HGG subtypes being removed (from #1382), which we do not want.
#1383 - rerun tp53 module using RNA file in /data code update and rerun of TP53 module This module had been grabbing a local copy of the collapsed stranded RNA-Seq file, which was 7 samples short - the same 7 HGG samples being removed from HGG subtyping. The ifelse looked ok to me, but I commented it out so the /data folder would be default and this brought back the 7 samples.
#1387 - update pathology free text for LGAT (1/2) code updates for LGG pathology free text The "Notes" field from the base histologies file was being retained. Since it is all NA, during the join with compiled subtypes, samples were being removed, so I removed that field. I also made the update to convert the SEGA molecular subtype from "LGG, subtype" --> "SEGA, subtype".
#1388 - Rerun subtype integrate post path update (2/2) rerun of mol sub integrate after code updates from #1387 Ran as a check, esp bc I noticed GNT recoding wasn't done til now. This new histologies file looks as expected given the updates in #1387, but we still also need to add the HGG and MB updates.

Code updates for #1376, #1383, and #1387 need to go in and then at least the following modules should be rerun: molecular-subtyping-HGG molecular-subtyping-pathology molecular-subtyping-integrate

1383 logic needs to be checked - this was a quick and dirty way to do the update, so probably needs editing.

cc @jaclyn-taroni

jaclyn-taroni commented 2 years ago

I think this can be closed now.