Closed jaclyn-taroni closed 2 years ago
Quick clarification here:
- We run scripts/generate-analysis-files-for-release.sh, which should generate all the analysis files for a release and put them in scratch/analysis_files_for_release, and commit any changes to files that are included in the repository. (PR with this shell script coming very soon; https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1412 is an example of the commit any changes part!)
- We add all the analysis files to the release.
The files that are added to release are those in modules, or those in the scratch/analysis_files_for_release
? My reading here is that those are the same files, but they are placed into scratch/
for data release support. Am I reading this right?
They are in both places but compiled into scratch/analysis_files_for_release
for convenience (to support data release, as you say).
collapse-rnaseq
* Uses metadata. **Changes may need to be made.** ⚠️ * Actually something seems off here. In the current `scripts/run-for-subtyping.sh` , I see this is run as: `OPENPBTA_BASE_SUBTYPING=1 ../analyses/collapse-rnaseq/run-collapse-rnaseq.sh`, but I do not see a corresponding way to accept this arg in analysis script.
Yea, the current scripts/run-for-subtyping.sh
is wrong. The only place the metadata gets used in this module is in this file: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e72c11dbebed2377072df43150fd15f2ffa0262a/analyses/collapse-rnaseq/00-create-rsem-files.R
Which not run via the shell script: https://github.com/AlexsLemonade/OpenPBTA-analysis/blob/e72c11dbebed2377072df43150fd15f2ffa0262a/analyses/collapse-rnaseq/run-collapse-rnaseq.sh
Because we distribute one RSEM file for each selection strategy now. I think that Rscript may be used upstream, which is to say that I am afraid to touch it really.
Splitting up changes described in https://github.com/AlexsLemonade/OpenPBTA-analysis/issues/1399#issuecomment-1124329628.
There are a number of analysis files that are included in data downloads now. If you look at the current version of
scripts/run-for-subtyping.sh
, you will see that some of these are currently run in that script. However, I will split up what is required for generating analysis files for release from what is required for subtyping in subsequent pull requests.All of the changes included here pertain to modules that will be run for generating analysis files. (Subtyping modules, where possible, will use
data/
[see: #1413, #1414, #1415, and #1418].) Because the analysis file generation will happen prior to subtyping, these steps still need to use thepbta-histologies-base.tsv
file. The logic I am adding or modifying will allow us to do that in subsequent PR(s).This might not be super clear at this point, so I'll go ahead and outline how data releases will work going forward after everything goes through:
scripts/generate-analysis-files-for-release.sh
, which should generate all the analysis files for a release and put them inscratch/analysis_files_for_release
, and commit any changes to files that are included in the repository. (PR with this shell script coming very soon; #1412 is an example of the commit any changes part!)scripts/run-for-subtyping.sh
and commit any changes to files that are included in the repository. (PR coming soon!)pbta-histologies.tsv
to the release