Bioconductor / RaggedExperiment

Matrix-like representations of mutation and CN data
https://bioconductor.org/packages/RaggedExperiment
4 stars 3 forks source link

Error in colnames<- #21

Closed lgeistlinger closed 5 years ago

lgeistlinger commented 5 years ago

I'm pulling a RaggedExperiment from TCGA:

> gbm <- curatedTCGAData::curatedTCGAData("GBM", "CNVSNP", FALSE)
> gbm.ra <- gbm[[1]]
> gbm.ra
class: RaggedExperiment 
dim: 17818 154 
assays(2): Num_Probes Segment_Mean
rownames: NULL
colnames(154): TCGA-02-0047-01A-01D-0182-01
  TCGA-02-0055-01A-01D-0182-01 ...
  TCGA-76-4931-01A-01D-1479-01 TCGA-28-2510-01A-01D-1694-01
colData names(0):

Looks like there is a small bug in the colnames<- replacement method:

> colnames(gbm.ra) <- TCGAutils::TCGAbarcode(colnames(gbm.ra), sample=TRUE)
Error in .local(x, ..., value = value) : 
  154 rows in value to replace 1104rows
> sessionInfo()
R Under development (unstable) (2019-01-07 r75958)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Debian GNU/Linux 9 (stretch)

Matrix products: default
BLAS/LAPACK: /usr/lib/libopenblasp-r0.2.19.so

locale:
 [1] LC_CTYPE=en_US.UTF-8       LC_NUMERIC=C              
 [3] LC_TIME=en_US.UTF-8        LC_COLLATE=en_US.UTF-8    
 [5] LC_MONETARY=en_US.UTF-8    LC_MESSAGES=C             
 [7] LC_PAPER=en_US.UTF-8       LC_NAME=C                 
 [9] LC_ADDRESS=C               LC_TELEPHONE=C            
[11] LC_MEASUREMENT=en_US.UTF-8 LC_IDENTIFICATION=C       

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils    
[7] datasets  methods   base     

other attached packages:
 [1] TCGAutils_1.3.5                        
 [2] org.Hs.eg.db_3.7.0                     
 [3] TxDb.Hsapiens.UCSC.hg19.knownGene_3.2.2
 [4] GenomicFeatures_1.35.4                 
 [5] AnnotationDbi_1.45.0                   
 [6] curatedTCGAData_1.5.6                  
 [7] CNVRanger_0.99.6                       
 [8] RaggedExperiment_1.7.1                 
 [9] MultiAssayExperiment_1.9.2             
[10] SummarizedExperiment_1.13.0            
[11] DelayedArray_0.9.6                     
[12] BiocParallel_1.17.6                    
[13] matrixStats_0.54.0                     
[14] Biobase_2.43.1                         
[15] GenomicRanges_1.35.1                   
[16] GenomeInfoDb_1.19.1                    
[17] IRanges_2.17.4                         
[18] S4Vectors_0.21.10                      
[19] BiocGenerics_0.29.1                    

loaded via a namespace (and not attached):
 [1] httr_1.4.0                    jsonlite_1.6                 
 [3] bit64_0.9-7                   AnnotationHub_2.15.4         
 [5] shiny_1.2.0                   assertthat_0.2.0             
 [7] interactiveDisplayBase_1.21.0 BiocManager_1.30.4           
 [9] blob_1.1.1                    GenomeInfoDbData_1.2.0       
[11] Rsamtools_1.35.1              yaml_2.2.0                   
[13] progress_1.2.0                pillar_1.3.1                 
[15] RSQLite_2.1.1                 lattice_0.20-38              
[17] glue_1.3.0                    digest_0.6.18                
[19] promises_1.0.1                XVector_0.23.0               
[21] rvest_0.3.2                   htmltools_0.3.6              
[23] httpuv_1.4.5.1                Matrix_1.2-15                
[25] XML_3.98-1.16                 pkgconfig_2.0.2              
[27] biomaRt_2.39.2                zlibbioc_1.29.0              
[29] purrr_0.2.5                   xtable_1.8-3                 
[31] later_0.7.5                   tibble_2.0.1                 
[33] magrittr_1.5                  crayon_1.3.4                 
[35] mime_0.6                      memoise_1.1.0                
[37] xml2_1.2.0                    tools_3.6.0                  
[39] prettyunits_1.0.2             hms_0.4.2                    
[41] stringr_1.3.1                 bindrcpp_0.2.2               
[43] Biostrings_2.51.2             compiler_3.6.0               
[45] rlang_0.3.1                   grid_3.6.0                   
[47] GenomicDataCommons_1.7.3      RCurl_1.95-4.11              
[49] rstudioapi_0.9.0              rappdirs_0.3.1               
[51] bitops_1.0-6                  ExperimentHub_1.9.1          
[53] DBI_1.0.0                     curl_3.3                     
[55] R6_2.3.0                      GenomicAlignments_1.19.1     
[57] dplyr_0.7.8                   rtracklayer_1.43.1           
[59] bit_1.1-14                    bindr_0.1.1                  
[61] readr_1.3.1                   stringi_1.2.4                
[63] Rcpp_1.0.0                    tidyselect_0.2.5   
LiNk-NY commented 5 years ago

This could be a cache issue. You should clear your cache. I get a GBM object with more columns:

gbm <- curatedTCGAData::curatedTCGAData("GBM", "CNVSNP", FALSE)
#> snapshotDate(): 2019-01-15
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> downloading 0 resources
#> loading from cache 
#>     '/home/mr148//.ExperimentHub/670'
#> Loading required package: RaggedExperiment
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> downloading 0 resources
#> loading from cache 
#>     '/home/mr148//.ExperimentHub/671'
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> downloading 0 resources
#> loading from cache 
#>     '/home/mr148//.ExperimentHub/674'
#> see ?curatedTCGAData and browseVignettes('curatedTCGAData') for documentation
#> downloading 0 resources
#> loading from cache 
#>     '/home/mr148//.ExperimentHub/685'
#> harmonizing input:
#>   removing 6776 sampleMap rows not in names(experiments)
#>   removing 7 colData rownames not in sampleMap 'primary'
gbm.ra <- gbm[[1]]
gbm.ra
#> class: RaggedExperiment 
#> dim: 146852 1104 
#> assays(2): Num_Probes Segment_Mean
#> rownames: NULL
#> colnames(1104): TCGA-02-0001-01C-01D-0182-01
#>   TCGA-02-0001-10A-01D-0182-01 ... TCGA-RR-A6KC-01A-31D-A33S-01
#>   TCGA-RR-A6KC-10A-01D-A33V-01
#> colData names(0):

Created on 2019-01-23 by the reprex package (v0.2.1)

lgeistlinger commented 5 years ago

You're right. Thanks!

lgeistlinger commented 5 years ago

More on that, this is actually how I run into this error:

> gbm <- curatedTCGAData::curatedTCGAData("GBM", c("CNVSNP", "RNASeq2GeneNorm"), FALSE)
> gbm <- MultiAssayExperiment::intersectColumns(gbm)
> gbm <- TCGAutils::splitAssays(gbm)
> gbm.ra <- gbm[[1]]
> colnames(gbm.ra) <- TCGAutils::TCGAbarcode(colnames(gbm.ra), sample=TRUE)
Error in .local(x, ..., value = value) : 
  154 rows in value to replace 1104rows
LiNk-NY commented 5 years ago

Ah yes, this looks like an index issue after subsetting. I'll look into it.