Closed jharenza closed 2 years ago
@jaclyn-taroni the missing samples seem to be back in the results file with https://github.com/AlexsLemonade/OpenPBTA-analysis/pull/1383/commits/61f9ba56e4a58b24a910da6a7aa275f968385124
> missing <- c("BS_SB12W1XT",
+ "BS_FXJY0MNH",
+ "BS_HE0WJRW6",
+ "BS_D7XRFE0R",
+ "BS_KABQQA0T",
+ "BS_FN07P04C",
+ "BS_SHJA4MR0")
>
> tp53_strand <- read_tsv("~/Documents/GitHub/OpenPBTA-analysis/analyses/tp53_nf1_score/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded_classifier_scores.tsv")
── Column specification ──────────────────────────────────────────────────────────────────────────────────
cols(
sample_id = col_character(),
ras_score = col_double(),
tp53_score = col_double(),
nf1_score = col_double(),
ras_shuffle = col_double(),
tp53_shuffle = col_double(),
nf1_shuffle = col_double()
)
> setdiff(missing, tp53_strand$sample_id)
character(0)
I am not entirely sure why there are 1014 samples in the old file, but my guess is maybe this was last run when we had added those "polya+stranded" samples? But, we have the expected 977 stranded samples which are in the v21 histologies file in the output now.
Purpose/implementation Section
What scientific question is your analysis addressing?
There are some samples missing from the results of #1382.
What was your approach?
Commented out the else part of ifelse in the run script to force run using the
/data
folder source file.Background:
These 7 samples are all in pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds, which is the input for the classifier. However, they are not in pbta-gene-expression-rsem-fpkm-collapsed.stranded_classifier_scores.tsv, the results of the classifier.
However, the files in OpenPBTA-analysis/analyses/collapse-rnaseq/results/pbta-gene-expression-rsem-fpkm-collapsed.stranded.rds are these exact 7 samples short. Even though there is an ifelse to run using the /data folder when not run for subtyping, it seems that although I am reading the logic as OK, the module has been using the results file from collapse-rnaseq.
What GitHub issue does your pull request address?
NA
Directions for reviewers. Tell potential reviewers what kind of feedback you are soliciting.
Which areas should receive a particularly close look?
am I reading the logic wrong? why was it using the results folder file?
Is there anything that you want to discuss further?
Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
yes
Results
What types of results are included (e.g., table, figure)?
What is your summary of the results?
Reproducibility Checklist
Documentation Checklist
README
and it is up to date.analyses/README.md
and the entry is up to date.