Closed sjspielman closed 2 years ago
This PR now also uncomments three runs that were wrongly commented out in CI, including the oncoprint module itself.
Notably, one of those runs was the chromosomal-instability
module which was updated with V22 in #1460. If that module has CI failures, we'll have to take a step back and fix that module likely by filing a fix to this PR's branch.
@sjspielman do you think you can/would want to run or stack on top of this, the manuscript figure generation script here? I am concerned about the *goi*
plots, which I think are what we were using in the paper, having the old bug we once had for deletions showing up across samples, which was fixed with ploidy along the way...
@sjspielman do you think you can/would want to run or stack on top of this, the manuscript figure generation script here? I am concerned about the
*goi*
plots, which I think are what we were using in the paper, having the old bug we once had for deletions showing up across samples, which was fixed with ploidy along the way...
Oh my bad, I was looking at the wrong repo 🤦♀️
Hi @sjspielman. I think I also found a bug and noticed we are missing count tables for "other" HGG/LGG/embryonal tumors since they are counting off of cancer group and not cancer group display. See if this makes sense to you.
In 00-prepare-goi-lists.R
, line 30:
all_goi_df <- readr::read_csv(file.path("data",
"oncoprint-goi-lists-OpenPBTA.csv"))
I think this should be
all_goi_df <- readr::read_csv(file.path(data_dir,
"oncoprint-goi-lists-OpenPBTA.csv"))
Strange that the output of this script actually worked...since I see that there were changes to data/cancer_group_goi_list_mapping.tsv
.
Also in 00-prepare-goi-lists.R
, I think we need to update line 71 to add cancer_group_display
so that the Ns in the final table for the paper reflect those being displayed via color.
select(oncoprint_group, cancer_group) %>%
Add:
select(oncoprint_group, cancer_group, cancer_group_display) %>%
Then in script 04-alteration-counts-by-cancer-group.R
, modify line 131-135 to say cancer_group_display
instead of cancer_group
:
for (cg in cancer_group_goi_df$cancer_group_display) {
# Filtered metadata by cancer group
filtered_metadata_df <- metadata %>%
filter(cancer_group_display == cg)
lines 150-152 update to cancer_group_display:
cg_goi_path <- cancer_group_goi_df %>%
filter(cancer_group_display == cg) %>%
pull(goi_list_file)
:100: on this needing a fix -
all_goi_df <- readr::read_csv(file.path(data_dir,
"oncoprint-goi-lists-OpenPBTA.csv"))
When running via the run-oncoprint.sh
script, it's all set to run from the analysis path, so this would have worked fine in CI, but it's a bug for sure since the script can't be run stand-alone without the change.
I updated to include cancer_group_display
in the select()
as well.
Regarding the issues in 04
, this solution doesn't work because cancer_group_display
is not in the metadata. I'm working on a fix for this now.
Edit - I believe I've fixed this, which necessitated reading in the palette file.
Looks like this did not pass CI, but via local testing I confirmed it does run within the Docker with the full data. So it seems like something is up with the test data for this module. I will have to come back to this next week since I'm leaving office shortly for holiday weekend.
Noting that this closes #1501
In 2d91faf I have fixed the following -
I am still working towards fixing the palette.
I fixed the conflicts because the reason we had CI failure should be fixed in master
.
@jharenza The palette has fixed here and correct cancer_group_display
colors/names are used, so this is ready for another look. However, the cancer group legend order doesn't exactly match the cancer group order in the oncoprint itself (but colors are right!), and there are some circumstances where there are more colors in legend than actually are in the plot. I can fix this, but I also wonder if it's overkill since nothing is actually wrong or incorrect in the figures, and these figures aren't going into the manuscript. What do you think?
The build for 980552c failed, but it was a timeout, and the build made it past oncoprint!
I also wonder if it's overkill since nothing is actually wrong or incorrect in the figures, and these figures aren't going into the manuscript. What do you think?
I think that is fine.
I think we are still missing the "Other high-grade glioma", "Other low-grade glioma", "Other embryonal" from the following tables, which need to now use cancer_group_display
instead of cancer_group
, right?
primary_only_n_per_cancer_group.tsv
primary-plus_n_per_cancer_group.tsv
I think we are still missing the "Other high-grade glioma", "Other low-grade glioma", "Other embryonal" from the following tables, which need to now use cancer_group_display instead of cancer_group, right?
Yes, thanks, good catch! I had checked that the tables for these new cancer groups were present in tables/cancer_group_counts/
, but I didn't check this file.
@jharenza I've fixed these tables to use cancer_group_display
. The files now have the header cancer_group_display
instead of cancer_group
(expected), and they contain rows for "Other"
and NA
groupings. How do we feel about that? I was thinking I could either drop NAs, or I could switch to cancer_display
if the display value is NA
. But, if I did that, then the header would not accurately reflect the content anymore.
I was thinking I could either drop NAs, or I could switch to cancer_display if the display value is NA. But, if I did that, then the header would not accurately reflect the content anymore.
I think maybe dropping NAs should be fine.
@jharenza NA
s now removed!
This PR re-runs the
oncoprint-landscape
module with V22 viabash run-oncoprint.sh
. Along the way I encountered a fatal path bug in the04
script which I fixed to run the module.There are several expected changes based on the new cancer groups in V22, including both results and plots. These plots do not go into the manuscript however, so I did not update them with
cancer_display_group
names as decided in #1401.