PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
445 stars 219 forks source link

subsetMaf error #177

Closed Eirinits closed 6 years ago

Eirinits commented 6 years ago

I am trying to subset a maf file according to tumor barcodes but I get this as an error:

> mut.CMS1 = subsetMaf(mut.query, tsb = Sample.IDs, mafObj = TRUE)
Error in subsetMaf(mut.query, tsb = Sample.IDs, mafObj = TRUE) : 
  trying to get slot "maf.silent" from an object (class "tbl_df") that is not an S4 object 

where

> str(Sample.IDs)
'data.frame':   84 obs. of  1 variable:
 $ x: chr  "TCGA-AA-3815-01A-01R-1022-07" "TCGA-AA-A02R-01A-01R-A00A-07" "TCGA-CK-4951-01A-01R-1410-07" "TCGA-AA-3518-01A-02R-0826-07" ...

I am not sure what the maf.silent slot is, so I could solve the error. Thanks in advance for your help!

PoisonAlien commented 6 years ago

Hi, Your Sample.IDs is a data frame. Its should be a simple vector.

try this and let me know if it works.

> samples = as.character(Sample.IDs$x)
> mut.CMS1 = subsetMaf(mut.query, tsb = samples, mafObj = TRUE)
Eirinits commented 6 years ago

Hi

I tried it but it returns again the same error.

> samples =  as.character(Sample.IDs$x)
> str(samples)
 chr [1:84] "TCGA-AA-3815-01A-01R-1022-07" "TCGA-AA-A02R-01A-01R-A00A-07" "TCGA-CK-4951-01A-01R-1410-07" "TCGA-AA-3518-01A-02R-0826-07" ...
> mut.CMS1 = subsetMaf(mut.query, tsb = samples, mafObj = TRUE)
Error in subsetMaf(mut.query, tsb = samples, mafObj = TRUE) : 
  trying to get slot "maf.silent" from an object (class "tbl_df") that is not an S4 object 
PoisonAlien commented 6 years ago

Okay I see, is mut.query an MAF object ? You can check it with class(mut.query) - it should show something as below,

> class(mut.query)
[1] "MAF"
attr(,"package")
[1] "maftools"
ShixiangWang commented 6 years ago

It seems the input is a tibble but not a MAF object.

Eirinits commented 6 years ago

It is indeed not a MAF object..

class(mut.query) [1] "tbl_df" "tbl" "data.frame"

PoisonAlien commented 6 years ago

how did you generate mut.query ? You should start with read.maf to read your maf file and use the resulting MAF object as an input to every function in maftools.

Eirinits commented 6 years ago

With GDCquery_Maf from TCGAbiolinks. I should do that, thanks for your tip.

I turned it into a MAF object

mut.maf = read.maf(maf = mut.query, clinicalData = clinical, isTCGA = TRUE)

Trying subsetting again, but I get a new error

mut.CMS1 = subsetMaf(mut.maf, tsb = samples, mafObj = TRUE) Error in dcast.data.table(data = vc, formula = Tumor_Sample_Barcode ~ : Can not cast an empty data.table

Does that mean that the IDs I want to subset with are not included in the MAF?

PoisonAlien commented 6 years ago

Could you post your sessionIfnfo ?

Eirinits commented 6 years ago
> sessionInfo()
R version 3.5.0 (2018-04-23)
Platform: x86_64-apple-darwin15.6.0 (64-bit)
Running under: macOS High Sierra 10.13.6

Matrix products: default
BLAS: /System/Library/Frameworks/Accelerate.framework/Versions/A/Frameworks/vecLib.framework/Versions/A/libBLAS.dylib
LAPACK: /Library/Frameworks/R.framework/Versions/3.5/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] parallel  stats4    stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
 [1] fuzzyjoin_0.1.4             maftools_1.6.15             TCGAutils_1.1.19            SummarizedExperiment_1.10.1 DelayedArray_0.6.2         
 [6] BiocParallel_1.14.2         matrixStats_0.54.0          Biobase_2.40.0              GenomicRanges_1.32.6        GenomeInfoDb_1.16.0        
[11] IRanges_2.14.10             S4Vectors_0.18.3            BiocGenerics_0.26.0         TCGAbiolinks_2.9.2          gaia_2.24.0                
[16] data.table_1.11.4          

loaded via a namespace (and not attached):
  [1] changepoint_2.2.2           backports_1.1.2             circlize_0.4.4              aroma.light_3.10.0          NMF_0.21.0                 
  [6] plyr_1.8.4                  selectr_0.4-1               ConsensusClusterPlus_1.44.0 lazyeval_0.2.1              splines_3.5.0              
 [11] gridBase_0.4-7              ggplot2_3.0.0               sva_3.28.0                  digest_0.6.15               foreach_1.4.4              
 [16] fansi_0.2.3                 magrittr_1.5                memoise_1.1.0               BSgenome_1.48.0             cluster_2.0.7-1            
 [21] doParallel_1.0.11           limma_3.36.2                ComplexHeatmap_1.18.1       Biostrings_2.48.0           readr_1.1.1                
 [26] annotate_1.59.1             wordcloud_2.5               R.utils_2.6.0               prettyunits_1.0.2           colorspace_1.3-2           
 [31] blob_1.1.1                  rvest_0.3.2                 rappdirs_0.3.1              ggrepel_0.8.0               dplyr_0.7.6                
 [36] crayon_1.3.4                RCurl_1.95-4.11             jsonlite_1.5                genefilter_1.62.0           bindr_0.1.1                
 [41] VariantAnnotation_1.26.1    survival_2.42-6             zoo_1.8-3                   iterators_1.0.10            glue_1.3.0                 
 [46] survminer_0.4.2             GenomicDataCommons_1.4.1    registry_0.5                gtable_0.2.0                zlibbioc_1.26.0            
 [51] XVector_0.20.0              GetoptLong_0.1.7            shape_1.4.4                 scales_0.5.0                DESeq_1.32.0               
 [56] rngtools_1.3.1              DBI_1.0.0                   edgeR_3.22.3                bibtex_0.4.2                ggthemes_4.0.0             
 [61] Rcpp_0.12.18                xtable_1.8-2                progress_1.2.0              cmprsk_2.2-7                mclust_5.4.1               
 [66] bit_1.1-14                  matlab_1.0.2                km.ci_0.5-2                 httr_1.3.1                  RColorBrewer_1.1-2         
 [71] pkgconfig_2.0.1             XML_3.98-1.12               R.methodsS3_1.7.1           locfit_1.5-9.1              utf8_1.1.4                 
 [76] reshape2_1.4.3              tidyselect_0.2.4            rlang_0.2.1                 AnnotationDbi_1.43.1        munsell_0.5.0              
 [81] tools_3.5.0                 downloader_0.4              cli_1.0.0                   RSQLite_2.1.1               broom_0.5.0                
 [86] stringr_1.3.1               yaml_2.1.19                 knitr_1.20                  bit64_0.9-7                 survMisc_0.5.5             
 [91] purrr_0.2.5                 bindrcpp_0.2.2              EDASeq_2.14.1               nlme_3.1-137                slam_0.1-43                
 [96] R.oo_1.22.0                 xml2_1.2.0                  biomaRt_2.36.1              compiler_3.5.0              rstudioapi_0.7             
[101] curl_3.2                    tibble_1.4.2                geneplotter_1.58.0          stringi_1.2.4               GenomicFeatures_1.32.0     
[106] lattice_0.20-35             Matrix_1.2-14               KMsurv_0.1-5                pillar_1.3.0                GlobalOptions_0.1.0        
[111] cowplot_0.9.3               bitops_1.0-6                rtracklayer_1.40.3          R6_2.2.2                    latticeExtra_0.6-28        
[116] hwriter_1.3.2               ShortRead_1.38.0            gridExtra_2.3               codetools_0.2-15            assertthat_0.2.0           
[121] pkgmaker_0.27               rjson_0.2.20                withr_2.1.2                 GenomicAlignments_1.16.0    Rsamtools_1.32.2           
[126] GenomeInfoDbData_1.1.0      mgcv_1.8-24                 hms_0.4.2                   MultiAssayExperiment_1.6.0  grid_3.5.0                 
[131] tidyr_0.8.1                 ggpubr_0.1.7   
PoisonAlien commented 6 years ago

Okay thanks. Everything seems okay. Only thing I can see going wrong is, there are no samples in the maf that matches your query. You can check it as by setting MAFObj=FALSE,

mut.CMS1 = subsetMaf(mut.maf, tsb = samples, mafObj = FALSE)
#check how many rows
nrow(mut.CMS1)

If the above results in zero rows, I think you should doble check the sample names that you're querying for.

Eirinits commented 6 years ago

Yes, it is a zero! I will do that.

Thank you so much for your help!

ShixiangWang commented 6 years ago

This is caused by two different sample ID length

> samples =  as.character(Sample.IDs$x)
> str(samples)
 chr [1:84] "TCGA-AA-3815-01A-01R-1022-07" "TCGA-AA-A02R-01A-01R-A00A-07" "TCGA-CK-4951-01A-01R-1410-07" "TCGA-AA-3518-01A-02R-0826-07" ...

maftools use length 12 ID for TCGA data.

samples = substr(samples, 1, 12)

may help you.

Am I right?

PoisonAlien commented 6 years ago

@ShixiangWang Ahh ! Yes, you're rite. @Eirinits try subsetting with isTCGA argument true.

mut.CMS1 = subsetMaf(mut.maf, tsb = samples, mafObj = FALSE, isTCGA = TRUE)
#check how many rows
nrow(mut.CMS1)
Eirinits commented 6 years ago

@PoisonAlien @ShixiangWang isTCGA set to TRUE solved everything! Thanks both of you!!