Closed sjspielman closed 2 years ago
Just noting to double check the Ns here, per this comment
Looking more into the N situation, I think I found the discrepancy (code/figure updated in last commit).
@jharenza calculated as in this comment
> matched_participants <- v21 %>%
+ filter(experimental_strategy != "RNA-Seq") %>%
+ group_by(Kids_First_Participant_ID) %>%
+ summarize(strategies = paste0(experimental_strategy, collapse = ",")) %>%
+ filter(grepl("WXS", strategies) & grepl("WGS", strategies)) %>%
+ pull(Kids_First_Participant_ID)
> # Get the biospecimen IDS for these participants.
> bs <- v21 %>%
+ filter(Kids_First_Participant_ID %in% matched_participants) %>%
+ filter(experimental_strategy != "RNA-Seq" & !is.na(pathology_diagnosis)) %>%
+ select(Kids_First_Participant_ID, Kids_First_Biospecimen_ID, tumor_descriptor, experimental_strategy) %>%
+ unique() %>%
+ mutate(pt_desc = paste(Kids_First_Participant_ID, tumor_descriptor, sep = "_")) %>%
+ group_by(pt_desc, experimental_strategy) %>%
+ tally()
> sum(bs$n)
[1] 52
> as.data.frame(table(bs$pt_desc))
Var1 Freq
1 PT_0MXPTTM3_Initial CNS Tumor 2
2 PT_1E3E6GMF_Initial CNS Tumor 2
3 PT_9GKVQ9QS_Initial CNS Tumor 2
4 PT_HGM20MW7_Initial CNS Tumor 2
5 PT_KBFM551M_Initial CNS Tumor 2
6 PT_KBFM551M_Progressive Disease Post-Mortem 1
7 PT_KTRJ8TFY_Initial CNS Tumor 2
8 PT_KTRJ8TFY_Progressive Disease Post-Mortem 1
9 PT_KZ56XHJT_Initial CNS Tumor 2
10 PT_KZ56XHJT_Progressive 2
11 PT_KZ56XHJT_Progressive Disease Post-Mortem 1
12 PT_M23Q0DC3_Initial CNS Tumor 2
13 PT_M9XXJ4GR_Initial CNS Tumor 2
14 PT_NK8A49X5_Initial CNS Tumor 2
15 PT_NK8A49X5_Progressive 2
16 PT_QA9WJ679_Initial CNS Tumor 2
17 PT_VPEMAQBN_Initial CNS Tumor 2
18 PT_WGVEF96B_Initial CNS Tumor 2
In the figures/scripts/supp-snv-callers-panels.R
script, the follow code is used as directly adapted from the original notebook.
# Retrieve all the participant IDs for participants that have both WGS and WXS data.
matched_participants <- metadata %>%
filter(experimental_strategy != "RNA-Seq") %>%
group_by(Kids_First_Participant_ID) %>%
summarize(strategies = paste0(experimental_strategy, collapse = ",")) %>%
filter(grepl("WXS", strategies) & grepl("WGS", strategies)) %>%
pull(Kids_First_Participant_ID)
# Get the biospecimen IDS for these participants.
biospecimens <- metadata %>%
filter(Kids_First_Participant_ID %in% matched_participants) %>%
pull(Kids_First_Biospecimen_ID)
# below is now gone from code but showing here to compare
#n_participants <- length(matched_participants) # 13
#n_samples <- length(biospecimens) # 98 ## BUT THIS DISAGREES - MY CALC IS IN THE WRONG PLACE!
# Set up the Lancet data from the SQL database and only keep the biospecimens we identified.
lancet <- tbl(con, "lancet") %>%
select(
join_cols, "VAF" #matches `cols_to_keep` in original notebook
) %>%
inner_join(
select(
tbl(con, "samples"),
Tumor_Sample_Barcode = Kids_First_Biospecimen_ID,
experimental_strategy,
short_histology,
Kids_First_Participant_ID
)
) %>%
filter(Tumor_Sample_Barcode %in% biospecimens) %>%
as.data.frame()
## THIS SHOULD BE WHERE I CALC!! Now the code has this!!
n_participants <- length(unique(lancet$Kids_First_Participant_ID)) # 13
n_samples <- length(unique(lancet$Tumor_Sample_Barcode)) # 52!! matches!
Conclusion: 52 samples from 13 patients. Updating this
Woops, @jharenza just realized I forgot to actually request the review here :)
The S2 figure panel showing lancet WXS/WGS is updated to reflect sample sizes with a new plot subtitle:
98 samples from 13 patients
. This information can then be reproducibly included in the manuscript.Is the analysis in a mature enough form that the resulting figure(s) and/or table(s) are ready for review?
Yes
Questions for reviewers
Results
What types of results are included (e.g., table, figure)?
Figure with added subtitle