Bioconductor / GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.
http://bioconductor.github.io/GenomicDataCommons/
83 stars 23 forks source link

TCGA-OV: Normal controls visible on portal but not through bioconductor query #76

Closed chartl closed 4 years ago

chartl commented 4 years ago

On the exploration tab of the GDC portal, I can identify 71 TCGA samples which are solid tissue normal samples from ovarian tissue:

image

however, the corresponding GenomicDataCommons query shows only primary and recurrent tumors, and the counts are close bot do not match exactly:

table(sapply((GenomicDataCommons::files() %>% 
+   GenomicDataCommons::filter( analysis.workflow_type == 'HTSeq - FPKM-UQ') %>%
+   GenomicDataCommons::filter( cases.project.program.name=='TCGA') %>%
+   GenomicDataCommons::filter( cases.project.project_id=='TCGA-OV') %>% 
+   GenomicDataCommons::select( c("cases.samples.sample_type")) %>% 
+   GenomicDataCommons::results_all())$cases, function(x) { x[[1]][[1]][[1]]}))

  Primary Tumor Recurrent Tumor 
            374               5 
LiNk-NY commented 4 years ago

The two queries do not look the same. Why not use the cases() endpoint?

library(GenomicDataCommons)
cases() %>% 
    filter(files.experimental_strategy == "RNA-Seq" &
        project.project_id == "TCGA-OV" & 
        samples.sample_type == "solid tissue normal") %>% 
    results_all()