Open rjcorb opened 3 months ago
Quick question, why specifically only those two samples? It seems there are total 11 participants that only have Metastatic samples (4 Targeted Sequencing and 7 WGS samples):
# Filter to only DNA samples from tumors, where composition is not "Derived Cell Line" and "PDX", and are not metastatic
tumor_samples <- histology_df %>%
dplyr::filter(sample_type == "Tumor",
!composition %in% c("Derived Cell Line", "PDX"),
is.na(RNA_library),
experimental_strategy %in% c("WGS", "WXS", "Targeted Sequencing"),
!grepl("Metastatic secondary tumors", pathology_diagnosis, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE))
# Filter to participants with only Metastatic samples
tumor_samples_only_met <- histology_df %>%
dplyr::filter(sample_type == "Tumor",
!composition %in% c("Derived Cell Line", "PDX"),
is.na(RNA_library),
experimental_strategy %in% c("WGS", "WXS", "Targeted Sequencing"),
grepl("Metastatic secondary tumors", pathology_diagnosis, ignore.case = FALSE, perl = FALSE,
fixed = FALSE, useBytes = FALSE))
tumor_samples_only_met <- tumor_samples_only_met %>%
filter(!Kids_First_Participant_ID %in% tumor_samples$Kids_First_Participant_ID)
> unique(tumor_samples_only_met$Kids_First_Participant_ID) %>% length()
[1] 9
> unique(tumor_samples_only_met$Kids_First_Participant_ID)
[1] "PT_EV71W1JW" "PT_S9M3JJVB" "PT_E6BGSP51" "PT_XN1P30ZC" "PT_087EW14F" "PT_QH6X1C3A" "PT_YCMH9SQP" "PT_GKHDNKMW" "PT_KXQR3GS4"
# check the tumor_samples_only_met biospecimens in histology file
histology_df %>%
filter(Kids_First_Biospecimen_ID %in% tumor_samples_only_met$Kids_First_Biospecimen_ID) %>%
dplyr::select(Kids_First_Participant_ID, Kids_First_Biospecimen_ID, sample_type, composition, experimental_strategy, RNA_library, pathology_diagnosis) %>%
arrange(experimental_strategy)
# there are 4 Targeted Sequencing and 7 WGS samples
Kids_First_Participant_ID Kids_First_Biospecimen_ID sample_type composition experimental_strategy RNA_library pathology_diagnosis
<chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 PT_E6BGSP51 BS_NW8WV3D1 Tumor Solid Tissue Targeted Sequencing NA Metastatic secondary tumors
2 PT_087EW14F BS_X91E07CQ Tumor Not Available Targeted Sequencing NA Metastatic secondary tumors
3 PT_S9M3JJVB BS_BH45SCWY Tumor Not Available Targeted Sequencing NA Metastatic secondary tumors
4 PT_GKHDNKMW BS_H1E7ZSYG Tumor Not Available Targeted Sequencing NA Metastatic secondary tumors
5 PT_EV71W1JW BS_YETTZ1NC Tumor Solid Tissue WGS NA Metastatic secondary tumors
6 PT_S9M3JJVB BS_233JPDBD Tumor Solid Tissue WGS NA Metastatic secondary tumors
7 PT_E6BGSP51 BS_7F3V5AKH Tumor Solid Tissue WGS NA Metastatic secondary tumors
8 PT_XN1P30ZC BS_5VEEG4JT Tumor Solid Tissue WGS NA Metastatic secondary tumors
9 PT_QH6X1C3A BS_083RF2ZE Tumor Solid Tissue WGS NA Metastatic secondary tumors
10 PT_YCMH9SQP BS_A70G7S2W Tumor Solid Tissue WGS NA Metastatic secondary tumors
11 PT_KXQR3GS4 BS_551ZH7EV Tumor Solid Tissue WGS NA Metastatic secondary tumors
@komalsrathi I think only those two samples were included in a germline cohort we are working with, so I didn't realize there were others that are falling into the same category.
Oh okay, then I think it is acceptable to include the 11 samples listed above? If you agree with it then I'll create a PR with the changes.
Oh okay, then I think it is acceptable to include the 11 samples listed above? If you agree with it then I'll create a PR with the changes.
Hey @komalsrathi - @rjcorb and I were chatting about this since we did not realize these were mets. I would like Jenn to look into these before we add - I am not sure if it makes sense to add these if they may not have initial tumors in the brain. Eg some are osteo, nbl - so I want to hear back from her about the initial diagnoses first. I guess if they are initial solid tumors, we possibly can shift them out of PBTA cohort and into the appropriate cohorts or figure out some other way to handle, while making sure they are still in the independent specimen list. So let's just pause on this one.
Ok sure, I did rerun and push the results to a new branch but I will hold off on creating a PR.
What data file(s) does this issue pertain to?
independent-specimens.wgswxspanel.primary-plus.prefer.wgs.tsv
What release are you using?
v15
Put your question or report your issue here.
The following DNA-seq BS IDs should be added to patients, since they are the only DNA-seq samples available:
PT_YCMH9SQP: BS_A70G7S2W PT_EV71W1JW: BS_YETTZ1NC