Bioconductor / GenomicDataCommons

Provide R access to the NCI Genomic Data Commons portal.
http://bioconductor.github.io/GenomicDataCommons/
84 stars 23 forks source link

Clinical data are empty #70

Open bioinfo-dirty-jobs opened 5 years ago

bioinfo-dirty-jobs commented 5 years ago

I use GenomicDataCommons::status() $commit [1] "e588f035feefee17f562b3a1bc2816c49a2b2b19" $data_release [1] "Data Release 16.0 - March 26, 2019" $status [1] "OK" $tag [1] "1.20.0" $version [1] 1 I use this script for retrive some clinical information:


id="TCGA-LAML"

dati2b=files() %>% filter( ~ cases.project.project_id == id &
                             data_type == "Gene Expression Quantification" &
                             analysis.workflow_type=="HTSeq - FPKM-UQ" & cases.samples.sample_type =='Primary Blood Derived Cancer - Peripheral Blood') %>%   ids()

gdcdata(dati2b,progress=TRUE)
nome=paste("lista_download",id,"uidd.csv",sep="_")
write.csv2(as.data.frame(dati2),nome,row.names = F)
fnames <- dati2$file_name
List_files<-filenameToBarcode(fnames)
lnome=paste("lista_download",id,"uidd_download.csv",sep="_")
write.csv2(as.data.frame(List_files),lnome,row.names = F)

  case_id<-UUIDtoUUID(List_files$file_id, to_type = "case_id")

  GBM<-gdc_clinical(case_id$cases.case_id)

Many time the height and the weigth are empity. Why If use GDc from portal I found this information...? What I made wrong?

LiNk-NY commented 5 years ago

Hi @bioinfo-dirty-jobs , You will have to provide a minimal and reproducible example in order to get help. Please see this post. https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example

Best regards, Marcel

bioinfo-dirty-jobs commented 5 years ago

@LiNk-NY Sorry I update the code!! It was linked to the previous question ..

LiNk-NY commented 5 years ago

Hi @bioinfo-dirty-jobs , I would minimize the code to a few lines for reproducibility:

library(GenomicDataCommons)
#> Loading required package: magrittr
#> 
#> Attaching package: 'GenomicDataCommons'
#> The following object is masked from 'package:stats':
#> 
#>     filter
caseids <- c("558a239b-fe8b-4b56-9137-4cacf8324995", "02e4f2da-9977-4251-81da-9e9f3a2310de", 
             "3997c824-8b85-4e1d-b4b1-d32c26155296", "80017c88-e07f-4bf6-ad00-87f3e5473d6d", 
             "f58f22e9-76b0-441f-951f-6bc795f2b7bc", "e54de1c4-6cdc-4106-ab5d-1d7c745e690f"
)
laml <- gdc_clinical(caseids)
laml$exposures[, c('height', 'weight')]
#> # A tibble: 6 x 2
#>   height weight
#>   <lgl>  <lgl> 
#> 1 NA     NA    
#> 2 NA     NA    
#> 3 NA     NA    
#> 4 NA     NA    
#> 5 NA     NA    
#> 6 NA     NA
sessionInfo()
#> R version 3.6.0 RC (2019-04-19 r76406)
#> Platform: x86_64-pc-linux-gnu (64-bit)
#> Running under: Ubuntu 18.04.2 LTS
#> 
#> Matrix products: default
#> BLAS:   /usr/lib/x86_64-linux-gnu/blas/libblas.so.3.7.1
#> LAPACK: /usr/lib/x86_64-linux-gnu/lapack/liblapack.so.3.7.1
#> 
#> locale:
#>  [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
#>  [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
#>  [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=en_US.UTF-8   
#>  [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
#>  [9] LC_ADDRESS=C               LC_TELEPHONE=C            
#> [11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       
#> 
#> attached base packages:
#> [1] stats     graphics  grDevices utils     datasets  methods   base     
#> 
#> other attached packages:
#> [1] GenomicDataCommons_1.7.3 magrittr_1.5            
#> 
#> loaded via a namespace (and not attached):
#>  [1] Rcpp_1.0.1                  compiler_3.6.0             
#>  [3] pillar_1.3.1                GenomeInfoDb_1.19.3        
#>  [5] highr_0.8                   XVector_0.23.2             
#>  [7] bitops_1.0-6                tools_3.6.0                
#>  [9] zlibbioc_1.29.0             digest_0.6.18              
#> [11] jsonlite_1.6                evaluate_0.13              
#> [13] tibble_2.1.1                lattice_0.20-38            
#> [15] pkgconfig_2.0.2             rlang_0.3.4                
#> [17] Matrix_1.2-17               cli_1.1.0                  
#> [19] DelayedArray_0.9.9          curl_3.3                   
#> [21] yaml_2.2.0                  parallel_3.6.0             
#> [23] xfun_0.6                    GenomeInfoDbData_1.2.1     
#> [25] xml2_1.2.0                  httr_1.4.0                 
#> [27] stringr_1.4.0               dplyr_0.8.0.1              
#> [29] knitr_1.22                  hms_0.4.2                  
#> [31] rappdirs_0.3.1              S4Vectors_0.21.23          
#> [33] IRanges_2.17.5              tidyselect_0.2.5           
#> [35] stats4_3.6.0                grid_3.6.0                 
#> [37] glue_1.3.1                  Biobase_2.43.1             
#> [39] R6_2.4.0                    fansi_0.4.0                
#> [41] BiocParallel_1.17.19        rmarkdown_1.12             
#> [43] readr_1.3.1                 purrr_0.3.2                
#> [45] htmltools_0.3.6             matrixStats_0.54.0         
#> [47] BiocGenerics_0.29.2         GenomicRanges_1.35.1       
#> [49] assertthat_0.2.1            SummarizedExperiment_1.13.0
#> [51] utf8_1.1.4                  stringi_1.4.3              
#> [53] RCurl_1.95-4.12             crayon_1.3.4

Created on 2019-04-24 by the reprex package (v0.2.1)

@seandavi any insight on this? Thanks!

Best, Marcel