Bioconductor / GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.
http://bioconductor.github.io/GenomicDataCommons/
83 stars 23 forks source link

extract clinical data from previous research #52

Open bioinfo-dirty-jobs opened 6 years ago

bioinfo-dirty-jobs commented 6 years ago

I want to download all the clinical data from the rnaseq data selected:



expands = c("diagnoses","annotations",
            "demographic","exposures")
clinResults = cases() %>%

  GenomicDataCommons::select(filter( ~ cases.project.project_id == 'TCGA-OV' &
                                       type == 'gene_expression' &
                                       analysis.workflow_type == 'HTSeq - Counts') ) %>%
  GenomicDataCommons::expand(expands) %>%
  results(size=300)
str(clinResults,list.len=10)
write.table(clinResults,"Clinical_results.csv",sep="\t",row.names = FALS
```E)
seandavi commented 6 years ago

You'll need to do this in two steps.

  1. Do your files query and include the files.cases.case_id.
  2. Use the case_ids from query 1 as input to gdc_clinical.

Give it a try and let me know if you need more direction. Great question!

bioinfo-dirty-jobs commented 6 years ago

@seandavi Thanks so much.... I try to figure out.. .. but I miss something Here you have what I found

q = cases() %>%
    filter(~ project.project_id=='TCGA-OV'  &
             files.analysis.workflow_type == 'HTSeq - FPKM-UQ')
q %>% count()

file_ids = q %>% facet('files.cases.case_id') %>% response_all() %>%
  ids()

So I suppose I have in file_ids How can retrive all the diagnosis data... If I have the bcr_patient_uuid how can download the expression data? Could you please make me some example? thanks so much for the help and patience

seandavi commented 6 years ago

I'll write something up, but it may take me a few days--sorry for the delay. I really appreciate you working through this with us.

bioinfo-dirty-jobs commented 6 years ago

Dear @seandavi I still try to resolve the problem... but I miss something. So I found on cases I found this: grep('files.cases.case_id',available_fields('cases'),value=TRUE) but not on the expand field

q = cases() %>%
  GenomicDataCommons::filter(~ project.project_id=='TCGA-OV'  &
                               files.analysis.workflow_type == 'HTSeq - FPKM-UQ') %>% 
  GenomicDataCommons::expand("diagnoses") %>% facet()
q %>% results()