Closed jaclyn-taroni closed 4 years ago
Because of the extent of these breaking changes, I believe it's prudent to handle each module in a separate pull request as stated above. Unfortunately, that means that CI will fail for a bunch of these fixes until the last one goes in. So here is the general procedure I think we should follow:
This procedure has a significant weakness in that there may be changes introduced in any one fix that will cause CI to fail unexpectedly once the final fix goes in and this issue is closed. Once this issue gets closed, #569 should be updated such that it is in sync with master
and that will test all steps in CI (except fusion-summary
but that is tracked in #578). Any additional CI fixes can go into the AlexsLemonade:update-release-docs-v15
branch provided that they are small in scope.
So we can keep track of progress on these, I took @jaclyn-taroni 's list above and made it into a checklist. We can claim items and then check things off as we fix them. I'll start by claiming this first item. I'll put the PR number next to it too when I get it filed.
[x] gene-set-enrichment-analysis
- @cansavvy #585
[x] interaction-plots
- @jashapiro #582
[x] molecular-subtyping-embryonal
- @cbethell #591
[x] molecular-subtyping-EPN
- @jashapiro #592
[x] molecular-subtyping-HGG
- @cbethell #586
[x] sample-distribution-analysis
@jashapiro #584
[x] selection-strategy-comparison
- deprecated in #589 @jashapiro
[x] molecular-subtyping-chordoma
error - @cansavvy #590
[x] sv-analysis
02-shatterseek.R error - @cansavvy #587 (no actual changes were made this was a false alarm. See comments below).
I just realized that the sv-analysis
failure may simply be due to the fact that I had commented out the first script in that module. So the scope of that fix may be to group those near each other in .circleci/config.yml
or to add a shell script to that module.
Okay. Well I just started working on it now, I'll see if that's it.
@jaclyn-taroni you were right. It is fine if the first script is ran. #587
Okay š - would love to see those organized such that the step that was failing was immediately after the step that it depends on (perhaps after v15 is out out). I think that would have increased the chances I noticed that immediately.
Okay š - would love to see those organized such that the step that was failing was immediately after the step that it depends on (perhaps after v15 is out out). I think that would have increased the chances I noticed that immediately.
I was about to just make this change when I had the branch open but I didn't know if there were particular sequential orders to some of the other tests and didn't want to throw another possible wrench in our testings here. But yeah, we may even want to have a bash script that calls both and make it one CircleCI test.
@jaclyn-taroni In regards to selection-strategy-comparison
and your comment:
We may want to just deprecate this analysis at this point rather than try to maintain it?
I don't know enough about this analysis module to make an informed decision on this. Do we want to retire it though?
@jashapiro - what do you think, time to retire selection-strategy-comparison
?
I think it can be deprecated. Will do that now.
We did it, everyone! Changes incorporated to master in #569
To quote the release notes being added in #569, we're changing the names of well-enough-used columns in the clinical file:
I know this change to
pbta-histologies.tsv
will break a number of things. The purpose of this issue is to track what will need to be changed as a result. Not only will the column names need to be updated, but we will also need to rerun any notebooks, change documentation, etc.Anticipated issues
Here I'll list what I know needs to change in modules that are not deprecated.
Some of the modeling steps of
gene-set-enrichment-analysis
usedisease_type_new
: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/gene-set-enrichment-analysis/02-model-gsea.Rmd#L123 Luckily thegsva_anova_tukey
function is already flexible! https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/gene-set-enrichment-analysis/util/hallmark_models.R#L35The first step of
interaction-plots
uses thedisease_type_new
column to generate lists of samples: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/interaction-plots/scripts/01-disease-specimen-lists.R#L97 Documentation associated with that option will also need to change.We filter out ATRT and MB samples in
molecular-subtyping-embryonal
usingdisease_type_old
https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/molecular-subtyping-embryonal/01-samples-to-subset.Rmd#L128 and checkdisease_type_new
as well: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/molecular-subtyping-embryonal/01-samples-to-subset.Rmd#L136 Also in this module, we use both thedisease_type
columns in the subtyping and generating final tables quite a bit starting around https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/molecular-subtyping-embryonal/04-table-prep.Rmd#L325 The README for this module needs to change as well + this documentation: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/molecular-subtyping-embryonal/02-generate-subset-files.R#L6The subset files step of
molecular-subtyping-EPN
usesdisease_type_new
https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/molecular-subtyping-EPN/00-subset-for-EPN.R#L55In
molecular-subtyping-HGG
, we usedisease_type_new
quite a bit for classification based on defining lesions, here's just one example from the first notebook: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/molecular-subtyping-HGG/01-HGG-molecular-subtyping-defining-lesions.Rmd#L59disease_type_new
is one of the "layers" associated with all of the plotting insample-distribution-analysis
: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/sample-distribution-analysis/01-filter-across-types.R, https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/sample-distribution-analysis/02-multilayer-plots.R and is used in the tables generated in03-tumor-descriptor-and-assay-count
: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/sample-distribution-analysis/03-tumor-descriptor-and-assay-count.Rmd#L206selection-strategy-comparison
includes consideration ofdisease_type_new
: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/selection-strategy-comparison/01-selection-strategies.rmd#L169 We may want to just deprecate this analysis at this point rather than try to maintain it?Issues that have arisen as part of #576
molecular-subtyping-chordoma
fails with the following:That's from this chunk: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd#L166 I suspect what is actually happening is that there are no chordoma samples in the expression data used in CI and this step https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/molecular-subtyping-chordoma/01-Subtype-chordoma.Rmd#L147 We may want to take an approach that is similar to other subtyping modules and have the first step be a script that generates files that consist only of chordoma samples that are committed to the repository.
The
Add Shatterseek
step ofsv-analysis
, which isRscript analyses/sv-analysis/02-shatterseek.R
fails with:analyses/sv-analysis/02-shatterseek.R
uses an independent specimen file, which is included in its entirety in CI, to read in files:https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/analyses/sv-analysis/02-shatterseek.R#L48
The step that would have generated
scratch/sv-vcf/BS_K07KNTFY_withoutYandM.tsv
comes prior to this one in CIhttps://github.com/AlexsLemonade/OpenPBTA-analysis/blob/286ff25022930024bb9812e3cfad5410a2cf49c8/.circleci/config.yml#L142
So it will only have access to the subsetted Manta file. See https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/449#issuecomment-576021400 and https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/449#issuecomment-580441012 for more context. The
sv-analysis
module should be more robust to "missing" samples.Next steps
@cansavvy @cbethell @jashapiro I'd recommend splitting this up such that modifications to each module are in separate pull requests so you can go through any make sure you catch any documentation stuff I may not have come across.