BioinformaticsFMRP / TCGAbiolinks

TCGAbiolinks
http://bioconductor.org/packages/devel/bioc/vignettes/TCGAbiolinks/inst/doc/index.html
284 stars 109 forks source link

Clinical data for CTSP_DLBCL1 does not work #622

Open akauko opened 3 months ago

akauko commented 3 months ago

I was trying to access clinical data for the project CTSP-DLBCL1 without success. I suspect a bug.

GDCquery() was functional and resulted 37 hits:

query.1 <- GDCquery(
  project = "CTSP-DLBCL1",
  data.category = "Clinical",
  data.type="Clinical Supplement",
  access= "open"
)

Download worked partially. If I downloaded only one file per project, it created proper directory structure, but if I downloaded all files in the project, then I needed to manually move files to a proper directory.

GDCdownload(query.1)

GDCprepare_clinic() failed. It seems to expect xml. Clinical data in CTSP-DLBCL1 project is in json format.

GDCprepare_clinic(query.1, "patient")

The error:

in read_xml.character(xmlfile) : 
Start tag expected, '<' not found [4]

I tried also GDCquery_clinic(), but it did not work either.

GDCquery_clinic("CTSP-DLBCL1", type="clinical")

The error:

Error in data.frame(..., check.names = FALSE) : 
  arguments imply differing number of rows: 37, 38

Other projects of interests ("CGCI-HTMCP-DLBCL","TCGA-DLBC","CGCI-BLGSP") worked. Their clinical data was in xml format, so this might have something to do with the difference.

I would appreciate any help.

> sessionInfo()
R version 4.3.2 (2023-10-31 ucrt)
Platform: x86_64-w64-mingw32/x64 (64-bit)
Running under: Windows 10 x64 (build 19044)

Matrix products: default

locale:
[1] LC_COLLATE=Finnish_Finland.utf8  LC_CTYPE=Finnish_Finland.utf8    LC_MONETARY=Finnish_Finland.utf8 LC_NUMERIC=C                    
[5] LC_TIME=Finnish_Finland.utf8    

time zone: Europe/Helsinki
tzcode source: internal

attached base packages:
[1] stats     graphics  grDevices datasets  utils     methods   base     

other attached packages:
[1] TCGAbiolinks_2.30.0 dplyr_1.1.4        

loaded via a namespace (and not attached):
 [1] tidyselect_1.2.1            blob_1.2.4                  filelock_1.0.3              Biostrings_2.70.2           bitops_1.0-7               
 [6] fastmap_1.1.1               RCurl_1.98-1.14             BiocFileCache_2.10.1        XML_3.99-0.16.1             digest_0.6.35              
[11] lifecycle_1.0.4             KEGGREST_1.42.0             RSQLite_2.3.5               magrittr_2.0.3              compiler_4.3.2             
[16] rlang_1.1.3                 progress_1.2.3              tools_4.3.2                 utf8_1.2.4                  yaml_2.3.8                 
[21] data.table_1.15.2           knitr_1.45                  prettyunits_1.2.0           S4Arrays_1.2.1              bit_4.0.5                  
[26] curl_5.2.1                  DelayedArray_0.28.0         plyr_1.8.9                  xml2_1.3.6                  abind_1.4-5                
[31] withr_3.0.0                 purrr_1.0.2                 BiocGenerics_0.48.1         grid_4.3.2                  stats4_4.3.2               
[36] fansi_1.0.6                 colorspace_2.1-0            ggplot2_3.4.4               scales_1.3.0                biomaRt_2.58.2             
[41] SummarizedExperiment_1.32.0 cli_3.6.2                   crayon_1.5.2                generics_0.1.3              rstudioapi_0.15.0          
[46] httr_1.4.7                  tzdb_0.4.0                  DBI_1.2.2                   cachem_1.0.8                stringr_1.5.1              
[51] zlibbioc_1.48.0             rvest_1.0.4                 AnnotationDbi_1.64.1        TCGAbiolinksGUI.data_1.22.0 BiocManager_1.30.22        
[56] XVector_0.42.0              matrixStats_1.2.0           vctrs_0.6.5                 Matrix_1.6-5                jsonlite_1.8.8             
[61] IRanges_2.36.0              hms_1.1.3                   S4Vectors_0.40.2            bit64_4.0.5                 tidyr_1.3.1                
[66] glue_1.7.0                  stringi_1.8.3               gtable_0.3.4                GenomeInfoDb_1.38.6         GenomicRanges_1.54.1       
[71] munsell_0.5.0               tibble_3.2.1                pillar_1.9.0                rappdirs_0.3.3              GenomeInfoDbData_1.2.11    
[76] R6_2.5.1                    dbplyr_2.4.0                lattice_0.22-5              Biobase_2.62.0              readr_2.1.5                
[81] png_0.1-8                   memoise_2.0.1               renv_1.0.5                  Rcpp_1.0.12                 SparseArray_1.2.4          
[86] downloader_0.4              xfun_0.42                   MatrixGenerics_1.14.0       pkgconfig_2.0.3