rerun tp53 module using RNA file in `/data`

jharenza commented 2 years ago

Purpose/implementation Section

What scientific question is your analysis addressing?

There are some samples missing from the results of #1382.

What was your approach?

Commented out the else part of ifelse in the run script to force run using the /data folder source file.

Background:

These 7 samples are all in pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds, which is the input for the classifier. However, they are not in pbta-gene-expression-rsem-fpkm-collapsed.stranded_classifier_scores.tsv, the results of the classifier.

However, the files in OpenPBTA-analysis/analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds are these exact 7 samples short. Even though there is an ifelse to run using the /data folder when not run for subtyping, it seems that although I am reading the logic as OK, the module has been using the results file from collapse-rnaseq.

What GitHub issue does your pull request address?

NA

Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.

Which areas should receive a particularly close look?

am I reading the logic wrong? why was it using the results folder file?

Is there anything that you want to discuss further?

Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?

yes

Results

What types of results are included (e.g., table, figure)?

What is your summary of the results?

Reproducibility Checklist

[ ] The dependencies required to run the code in this pull request have been added to the project Dockerfile.
[ ] This analysis has been added to continuous integration.

Documentation Checklist

[ ] This analysis module has a README and it is up to date.
[ ] This analysis is recorded in the table in analyses/README.md and the entry is up to date.
[ ] The analytical code is documented and contains comments.

jharenza commented 2 years ago

@jaclyn-taroni the missing samples seem to be back in the results file with https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1383/commits/61f9ba56e4a58b24a910da6a7aa275f968385124

> missing <- c("BS_SB12W1XT",
+              "BS_FXJY0MNH",
+              "BS_HE0WJRW6",
+              "BS_D7XRFE0R",
+              "BS_KABQQA0T",
+              "BS_FN07P04C",
+              "BS_SHJA4MR0")
> 
> tp53_strand <- read_tsv("~/Documents/GitHub/OpenPBTA-analysis/analyses/tp53_nf1_score/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded_classifier_scores.tsv")

── Column specification ──────────────────────────────────────────────────────────────────────────────────
cols(
  sample_id = col_character(),
  ras_score = col_double(),
  tp53_score = col_double(),
  nf1_score = col_double(),
  ras_shuffle = col_double(),
  tp53_shuffle = col_double(),
  nf1_shuffle = col_double()
)

> setdiff(missing, tp53_strand$sample_id)
character(0)

jharenza commented 2 years ago

I am not entirely sure why there are 1014 samples in the old file, but my guess is maybe this was last run when we had added those "polya+stranded" samples? But, we have the expected 977 stranded samples which are in the v21 histologies file in the output now.

AlexsLemonade / OpenPBTA-analysis