d3b-center / OpenPedCan-analysis

The analysis repository for the Open Pediatric Cancer Project
https://d3b-center.github.io/OpenPedCan-analysis/
Other
16 stars 14 forks source link

Update WGS & WXS independent specimens #597

Open rjcorb opened 3 months ago

rjcorb commented 3 months ago

What data file(s) does this issue pertain to?

independent-specimens.wgswxspanel.primary-plus.prefer.wgs.tsv

What release are you using?

v15

Put your question or report your issue here.

The following DNA-seq BS IDs should be added to patients, since they are the only DNA-seq samples available:

PT_YCMH9SQP: BS_A70G7S2W PT_EV71W1JW: BS_YETTZ1NC

komalsrathi commented 3 months ago

Quick question, why specifically only those two samples? It seems there are total 11 participants that only have Metastatic samples (4 Targeted Sequencing and 7 WGS samples):

# Filter to only DNA samples from tumors, where composition is not "Derived Cell Line" and "PDX", and are not metastatic
tumor_samples <- histology_df %>%
  dplyr::filter(sample_type == "Tumor",
                !composition %in% c("Derived Cell Line", "PDX"),
                is.na(RNA_library),
                experimental_strategy %in% c("WGS", "WXS", "Targeted Sequencing"),
                !grepl("Metastatic secondary tumors", pathology_diagnosis, ignore.case = FALSE, perl = FALSE,
                       fixed = FALSE, useBytes = FALSE))

# Filter to participants with only Metastatic samples 
tumor_samples_only_met <- histology_df %>%
  dplyr::filter(sample_type == "Tumor", 
                !composition %in% c("Derived Cell Line", "PDX"), 
                is.na(RNA_library), 
                experimental_strategy %in% c("WGS", "WXS", "Targeted Sequencing"),
                grepl("Metastatic secondary tumors", pathology_diagnosis, ignore.case = FALSE, perl = FALSE,
                       fixed = FALSE, useBytes = FALSE))
tumor_samples_only_met <- tumor_samples_only_met %>%
  filter(!Kids_First_Participant_ID %in% tumor_samples$Kids_First_Participant_ID)

> unique(tumor_samples_only_met$Kids_First_Participant_ID) %>% length()
[1] 9

> unique(tumor_samples_only_met$Kids_First_Participant_ID)
[1] "PT_EV71W1JW" "PT_S9M3JJVB" "PT_E6BGSP51" "PT_XN1P30ZC" "PT_087EW14F" "PT_QH6X1C3A" "PT_YCMH9SQP" "PT_GKHDNKMW" "PT_KXQR3GS4"

# check the tumor_samples_only_met biospecimens in histology file
histology_df %>%
  filter(Kids_First_Biospecimen_ID %in% tumor_samples_only_met$Kids_First_Biospecimen_ID) %>%
  dplyr::select(Kids_First_Participant_ID, Kids_First_Biospecimen_ID, sample_type, composition, experimental_strategy, RNA_library, pathology_diagnosis) %>%
  arrange(experimental_strategy)

# there are 4 Targeted Sequencing and 7 WGS samples

   Kids_First_Participant_ID Kids_First_Biospecimen_ID sample_type composition   experimental_strategy RNA_library pathology_diagnosis        
   <chr>                     <chr>                     <chr>       <chr>         <chr>                 <chr>       <chr>                      
 1 PT_E6BGSP51               BS_NW8WV3D1               Tumor       Solid Tissue  Targeted Sequencing   NA          Metastatic secondary tumors
 2 PT_087EW14F               BS_X91E07CQ               Tumor       Not Available Targeted Sequencing   NA          Metastatic secondary tumors
 3 PT_S9M3JJVB               BS_BH45SCWY               Tumor       Not Available Targeted Sequencing   NA          Metastatic secondary tumors
 4 PT_GKHDNKMW               BS_H1E7ZSYG               Tumor       Not Available Targeted Sequencing   NA          Metastatic secondary tumors
 5 PT_EV71W1JW               BS_YETTZ1NC               Tumor       Solid Tissue  WGS                   NA          Metastatic secondary tumors
 6 PT_S9M3JJVB               BS_233JPDBD               Tumor       Solid Tissue  WGS                   NA          Metastatic secondary tumors
 7 PT_E6BGSP51               BS_7F3V5AKH               Tumor       Solid Tissue  WGS                   NA          Metastatic secondary tumors
 8 PT_XN1P30ZC               BS_5VEEG4JT               Tumor       Solid Tissue  WGS                   NA          Metastatic secondary tumors
 9 PT_QH6X1C3A               BS_083RF2ZE               Tumor       Solid Tissue  WGS                   NA          Metastatic secondary tumors
10 PT_YCMH9SQP               BS_A70G7S2W               Tumor       Solid Tissue  WGS                   NA          Metastatic secondary tumors
11 PT_KXQR3GS4               BS_551ZH7EV               Tumor       Solid Tissue  WGS                   NA          Metastatic secondary tumors
rjcorb commented 3 months ago

@komalsrathi I think only those two samples were included in a germline cohort we are working with, so I didn't realize there were others that are falling into the same category.

komalsrathi commented 3 months ago

Oh okay, then I think it is acceptable to include the 11 samples listed above? If you agree with it then I'll create a PR with the changes.

jharenza commented 3 months ago

Oh okay, then I think it is acceptable to include the 11 samples listed above? If you agree with it then I'll create a PR with the changes.

Hey @komalsrathi - @rjcorb and I were chatting about this since we did not realize these were mets. I would like Jenn to look into these before we add - I am not sure if it makes sense to add these if they may not have initial tumors in the brain. Eg some are osteo, nbl - so I want to hear back from her about the initial diagnoses first. I guess if they are initial solid tumors, we possibly can shift them out of PBTA cohort and into the appropriate cohorts or figure out some other way to handle, while making sure they are still in the independent specimen list. So let's just pause on this one.

komalsrathi commented 3 months ago

Ok sure, I did rerun and push the results to a new branch but I will hold off on creating a PR.