AlexsLemonade / OpenPBTA-analysis

The analysis repository for the Open Pediatric Brain Tumor Atlas Project
Other
100 stars 67 forks source link

Re-run oncoprint-landscape module with V22 #1478

Closed sjspielman closed 2 years ago

sjspielman commented 2 years ago

This PR re-runs the oncoprint-landscape module with V22 via bash run-oncoprint.sh. Along the way I encountered a fatal path bug in the 04 script which I fixed to run the module.

There are several expected changes based on the new cancer groups in V22, including both results and plots. These plots do not go into the manuscript however, so I did not update them with cancer_display_group names as decided in #1401.

sjspielman commented 2 years ago

This PR now also uncomments three runs that were wrongly commented out in CI, including the oncoprint module itself.

Notably, one of those runs was the chromosomal-instability module which was updated with V22 in #1460. If that module has CI failures, we'll have to take a step back and fix that module likely by filing a fix to this PR's branch.

jharenza commented 2 years ago

@sjspielman do you think you can/would want to run or stack on top of this, the manuscript figure generation script here? I am concerned about the *goi* plots, which I think are what we were using in the paper, having the old bug we once had for deletions showing up across samples, which was fixed with ploidy along the way...

jharenza commented 2 years ago

@sjspielman do you think you can/would want to run or stack on top of this, the manuscript figure generation script here? I am concerned about the *goi* plots, which I think are what we were using in the paper, having the old bug we once had for deletions showing up across samples, which was fixed with ploidy along the way...

Oh my bad, I was looking at the wrong repo 🤦‍♀️

jharenza commented 2 years ago

Hi @sjspielman. I think I also found a bug and noticed we are missing count tables for "other" HGG/LGG/embryonal tumors since they are counting off of cancer group and not cancer group display. See if this makes sense to you.

In 00-prepare-goi-lists.R, line 30:

all_goi_df <- readr::read_csv(file.path("data", 
                                        "oncoprint-goi-lists-OpenPBTA.csv"))

I think this should be

all_goi_df <- readr::read_csv(file.path(data_dir, 
                                        "oncoprint-goi-lists-OpenPBTA.csv"))

Strange that the output of this script actually worked...since I see that there were changes to data/cancer_group_goi_list_mapping.tsv.

Also in 00-prepare-goi-lists.R, I think we need to update line 71 to add cancer_group_display so that the Ns in the final table for the paper reflect those being displayed via color.

  select(oncoprint_group, cancer_group) %>% 

Add:

  select(oncoprint_group, cancer_group, cancer_group_display) %>% 

Then in script 04-alteration-counts-by-cancer-group.R, modify line 131-135 to say cancer_group_display instead of cancer_group:

for (cg in cancer_group_goi_df$cancer_group_display) {

  # Filtered metadata by cancer group
  filtered_metadata_df <- metadata %>%
    filter(cancer_group_display == cg)

lines 150-152 update to cancer_group_display:

  cg_goi_path <- cancer_group_goi_df %>%
    filter(cancer_group_display == cg) %>%
    pull(goi_list_file)
sjspielman commented 2 years ago

:100: on this needing a fix -

all_goi_df <- readr::read_csv(file.path(data_dir, 
                                        "oncoprint-goi-lists-OpenPBTA.csv"))

When running via the run-oncoprint.sh script, it's all set to run from the analysis path, so this would have worked fine in CI, but it's a bug for sure since the script can't be run stand-alone without the change.

I updated to include cancer_group_display in the select() as well.

Regarding the issues in 04, this solution doesn't work because cancer_group_display is not in the metadata. I'm working on a fix for this now. Edit - I believe I've fixed this, which necessitated reading in the palette file.

sjspielman commented 2 years ago

Looks like this did not pass CI, but via local testing I confirmed it does run within the Docker with the full data. So it seems like something is up with the test data for this module. I will have to come back to this next week since I'm leaving office shortly for holiday weekend.

sjspielman commented 2 years ago

Noting that this closes #1501

sjspielman commented 2 years ago

In 2d91faf I have fixed the following -

I am still working towards fixing the palette.

jaclyn-taroni commented 2 years ago

I fixed the conflicts because the reason we had CI failure should be fixed in master.

sjspielman commented 2 years ago

@jharenza The palette has fixed here and correct cancer_group_display colors/names are used, so this is ready for another look. However, the cancer group legend order doesn't exactly match the cancer group order in the oncoprint itself (but colors are right!), and there are some circumstances where there are more colors in legend than actually are in the plot. I can fix this, but I also wonder if it's overkill since nothing is actually wrong or incorrect in the figures, and these figures aren't going into the manuscript. What do you think?

sjspielman commented 2 years ago

The build for 980552c failed, but it was a timeout, and the build made it past oncoprint!

jharenza commented 2 years ago

I also wonder if it's overkill since nothing is actually wrong or incorrect in the figures, and these figures aren't going into the manuscript. What do you think?

I think that is fine.

I think we are still missing the "Other high-grade glioma", "Other low-grade glioma", "Other embryonal" from the following tables, which need to now use cancer_group_display instead of cancer_group, right?

primary_only_n_per_cancer_group.tsv
primary-plus_n_per_cancer_group.tsv
sjspielman commented 2 years ago

I think we are still missing the "Other high-grade glioma", "Other low-grade glioma", "Other embryonal" from the following tables, which need to now use cancer_group_display instead of cancer_group, right?

Yes, thanks, good catch! I had checked that the tables for these new cancer groups were present in tables/cancer_group_counts/, but I didn't check this file.

sjspielman commented 2 years ago

@jharenza I've fixed these tables to use cancer_group_display. The files now have the header cancer_group_display instead of cancer_group (expected), and they contain rows for "Other" and NA groupings. How do we feel about that? I was thinking I could either drop NAs, or I could switch to cancer_display if the display value is NA. But, if I did that, then the header would not accurately reflect the content anymore.

jharenza commented 2 years ago

I was thinking I could either drop NAs, or I could switch to cancer_display if the display value is NA. But, if I did that, then the header would not accurately reflect the content anymore.

I think maybe dropping NAs should be fine.

sjspielman commented 2 years ago

@jharenza NAs now removed!