AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
99 stars 66 forks source link

#1144 part2: Sample distribution multi-layer plot #1161

Closed kgaonkar6 closed 3 years ago

kgaonkar6 commented 3 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

Update sample distribution multi-layer plot centered on cancer_group.

What was your approach?

So as per requested multi-layer plot in #1144 I've added

What GitHub issue does your pull request address?

1144

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

Yes

Results

What types of results are included (e.g., table, figure)?

figures

What is your summary of the results?

multi-layer sample distribution plot, example vis:

Screen Shot 2021-08-25 at 5 26 28 PM

Reproducibility Checklist

Documentation Checklist

kgaonkar6 commented 3 years ago

@jharenza I tried to update the colors to the palettes but seems like I'm missing something.

For example, I updated the level 5 which is the RNA-Seq level

level5 <- tm %>%
  dplyr::filter(level == 5 ) %>%
  dplyr::mutate( color = dplyr::if_else(level5 == "RNA-Seq", "#CCCCCC", "#FFFFFF"))

the level should have a gray and white color, but it has a variety of other colors.

Screen Shot 2021-08-27 at 10 49 13 PM

Not sure what I'm missing, can you and/or @cbethell give me some pointers?

kgaonkar6 commented 3 years ago

Thanks to @jharenza I think I've made some major progress with the color assignment as of https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1161/commits/73a91b364cdfe612a2b0abad64975bb7b9610353! 🙌

We also discussed adding Kids_First_Biospecimen_ID because we wanted to specify tumor_descriptors, but now the assays are mutually exclusive. Maybe sample_id will help capture the multiple assays available per specimen ?

Also I think we need specific color codes for tumor_descriptor? [1] "Initial CNS Tumor" "Progressive"
[3] "Progressive Disease Post-Mortem" "Recurrence"
[5] "Second Malignancy"

jharenza commented 3 years ago

oh yes, that's right - we should do this by sample_id instead of BS_ID to capture DNA/RNA pairs!

For:

Also I think we need specific color codes for tumor_descriptor? [1] "Initial CNS Tumor" "Progressive" [3] "Progressive Disease Post-Mortem" "Recurrence" [5] "Second Malignancy"

I don't remember if we had this code here or maybe I am now mixing up OpenPBTA and OpenTargets - I would suggest we use the following since we don't have a palette yet (taken from the simpsons palette we are using for violin plots):

Can you add this palette to the figures section and pull it in and if we need to change, we do it there?

kgaonkar6 commented 3 years ago

In the latest commits I've updated the plot to use sample_id and added figures/palettes/tumor_descriptor_palette.tsv for the tumor_descriptor values as mentioned above.

kgaonkar6 commented 3 years ago

hmm, seems to error out in CI with testing but runs ok locally with testing data. Rerunning CI to see if it fixes it.

kgaonkar6 commented 3 years ago

Thanks @jharenza !

However, it seems the figures code to plot sample distribution errors out in CI I've removed Rscript figures/scripts/fig1-sample-distribution.R from CI . Since we will be coping over the sunburst pie plot from here to figures.

Does that sound ok @jaclyn-taroni ?

jaclyn-taroni commented 3 years ago

If we're not going to use figures/scripts/fig1-sample-distribution.R, which is what I understand from your comment, we could also just remove that script from the repository in addition to removing it from CI (that seems less confusing to me).

kgaonkar6 commented 3 years ago

ok removed figures/scripts/fig1-sample-distribution.R