PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
450 stars 222 forks source link

Error applying colors to continuous clinicalFeatures #1050

Open juferban opened 1 month ago

juferban commented 1 month ago

Describe the issue Hello,

I am having an issue trying to add continuous clinical features to my oncoplot.

If I add more than one clinicalFeature that has continuous values the colors applied to the values seem to be mixed and not match the values they are supposed to represent.

More specifically, I had an oncoplot where I wanted to add to clinical features that represent to different way to measure response. If I only add one of the features to the plot, the color gradient applies correctly but if I add both clinical Features to the plots, most samples show the correct colors but random samples show colors that don't match. In my test, I specified the sample order using the sampleOrder variable in the oncoplot command and the sample order corresponded to the first clinical feature so the gradient should show from lowest to highest (which correctly does when only adding that first clinical Feature to the oncoplot). As soon as I add the second clinical feature some samples get a random color assigned.

The command do not throw any error.

Thanks for a great package!.

Command

oncoplot(maf = maf_object, 
          removeNonMutated = FALSE, 
          fill = TRUE, 
          clinicalFeatures = c('Treatment_Group','Response_IRC','Treatment_Duration'),
          sampleOrder = sorted_samples,
          showTitle = TRUE,
          titleFontSize = 1.5,
          legendFontSize = 1,
          annotationFontSize = 1,
          SampleNamefontSize = 0.7,
          fontSize = 0.7,
          showTumorSampleBarcodes = TRUE,
          barcode_mar = 4,
          gene_mar = 6,
          legend_height = 4,
          anno_height = 1.5,
          annoBorderCol = "white",
          annotationColor = annotationColor,
        )

Session info

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRblas.so 
LAPACK: /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      stats     graphics  grDevices utils     datasets  methods  
[8] base     

other attached packages:
 [1] circlize_0.4.16    maftools_2.18.0    RColorBrewer_1.1-3 statmod_1.5.0     
 [5] ggrepel_0.9.5      edgeR_4.0.16       limma_3.58.1       reshape2_1.4.4    
 [9] openxlsx_4.2.5.2   lubridate_1.9.3    forcats_1.0.0      stringr_1.5.1     
[13] dplyr_1.1.4        purrr_1.0.2        readr_2.1.5        tidyr_1.3.1       
[17] tibble_3.2.1       ggplot2_3.5.0      tidyverse_2.0.0    data.table_1.15.4 
[21] optparse_1.7.5     monoceRos_1.0.5   

loaded via a namespace (and not attached):
 [1] gtable_0.3.4        shape_1.4.6.1       GlobalOptions_0.1.2
 [4] lattice_0.22-6      tzdb_0.4.0          Cairo_1.6-2        
 [7] vctrs_0.6.5         tools_4.3.2         generics_0.1.3     
[10] getopt_1.20.4       fansi_1.0.6         pkgconfig_2.0.3    
[13] Matrix_1.6-5        uuid_1.2-0          lifecycle_1.0.4    
[16] compiler_4.3.2      munsell_0.5.1       repr_1.1.7         
[19] getPass_0.2-4       htmltools_0.5.8.1   pillar_1.9.0       
[22] crayon_1.5.2        tidyselect_1.2.1    locfit_1.5-9.9     
[25] zip_2.3.1           digest_0.6.35       stringi_1.8.3      
[28] splines_4.3.2       fastmap_1.1.1       colorspace_2.1-0   
[31] cli_3.6.2           magrittr_2.0.3      base64enc_0.1-3    
[34] survival_3.5-8      utf8_1.2.4          IRdisplay_1.1      
[37] withr_3.0.0         scales_1.3.0        IRkernel_1.3.2     
[40] timechange_0.3.0    pbdZMQ_0.3-11       hms_1.1.3          
[43] DNAcopy_1.76.0      evaluate_0.23       rlang_1.1.3        
[46] Rcpp_1.0.13         glue_1.7.0          jsonlite_1.8.8     
[49] R6_2.5.1            plyr_1.8.9         
biosunsci commented 1 month ago

hi @juferban, could you post some of your data which can lead to the bug to make us easy to Reproduce the bug?

juferban commented 1 month ago

Hi, Thanks for your quick response. I will generate a couple of files and will upload them so you can use them for testing. I will upload them soon.

Thanks a lot.

juferban commented 1 month ago

Hi @biosunsci

I am attaching the example files to be able to reproduce my problem. Also the code Is used for testing is as follow:

## Load MAF files
maf_object = read.maf(maf = "mutations_filtered.maf", 
                      clinicalData = "sample_annot_for_maf.txt", isTCGA = FALSE)

## Make sure the continuous variables are shows as continuous
maf_object@clinical.data$Response = as.numeric(maf_object@clinical.data$Response)
maf_object@clinical.data$Volume_Change = as.numeric(maf_object@clinical.data$Volume_Change)
maf_object@clinical.data$Treatment_Duration = as.numeric(maf_object@clinical.data$Treatment_Duration)

# Sort the clinical data by multiple variables, as I want to make sure I use my predefined sample sorting
sorted_clinical_data <- maf_object@clinical.data[order(
  maf_object@clinical.data$Gender,
  maf_object@clinical.data$Treatment_Group,

  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na(maf_object@clinical.data$Response ), 1000, as.numeric(maf_object@clinical.data$Response ))),

  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na(maf_object@clinical.data$Volume_Change ), 1000, as.numeric(maf_object@clinical.data$Volume_Change ))),

  # Handle NAs: NA values are set to 1000 so they appear first
  ifelse(is.na(maf_object@clinical.data$Treatment_Duration), 1000, as.numeric(maf_object@clinical.data$Treatment_Duration))
), ]

maf_object@clinical.data <- sorted_clinical_data

# Extract the sorted sample names
sorted_samples <- sorted_clinical_data$Tumor_Sample_Barcode

## Create the oncoplot
oncoplot(maf = maf_object, 
         removeNonMutated = FALSE, 
         fill = TRUE, 
         clinicalFeatures = c('Gender','Treatment_Group','Response','Volume_Change','Treatment_Duration'),
         sampleOrder = sorted_samples,
         showTitle = TRUE,
         titleFontSize = 1.5,
         legendFontSize = 1,
         annotationFontSize = 1,
         SampleNamefontSize = 0.5,
         fontSize = 0.7,
         showTumorSampleBarcodes = TRUE,
         barcode_mar = 3,
         gene_mar = 5,
         legend_height = 4,
         anno_height = 1.5,
         annoBorderCol = "white",
         drawRowBar = TRUE,
         genesToIgnore = 'KRAS',
         numericAnnoCol = TRUE,
         showPct = TRUE,
         rightBarLims = c(0, 100),
         leftBarLims = c(0, 100),
)

If I only use the clinical variables 'Gender', 'Treatment_Group' and 'Response' with Response being the only continuous variable, the coloring is correctly applied. As soon as I incorporate the other two continuous variables the coloring gets mixed up.

Thanks a lot,

oncoplots_examples.zip

This is my session info:

R version 4.3.2 (2023-10-31)
Platform: x86_64-pc-linux-gnu (64-bit)
Running under: Ubuntu 22.04.4 LTS

Matrix products: default
BLAS:   /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRblas.so 
LAPACK: /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRlapack.so;  LAPACK version 3.11.0

locale:
 [1] LC_CTYPE=C.UTF-8       LC_NUMERIC=C           LC_TIME=C.UTF-8       
 [4] LC_COLLATE=C.UTF-8     LC_MONETARY=C.UTF-8    LC_MESSAGES=C.UTF-8   
 [7] LC_PAPER=C.UTF-8       LC_NAME=C              LC_ADDRESS=C          
[10] LC_TELEPHONE=C         LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C   

time zone: Etc/UTC
tzcode source: system (glibc)

attached base packages:
[1] grid      stats4    stats     graphics  grDevices utils     datasets 
[8] methods   base     

other attached packages:
 [1] pROC_1.18.5          trackViewer_1.38.2   GenomicRanges_1.54.1
 [4] GenomeInfoDb_1.38.8  IRanges_2.36.0       S4Vectors_0.40.2    
 [7] BiocGenerics_0.48.1  maftools_2.18.0      pheatmap_1.0.12     
[10] survminer_0.4.9      ggpubr_0.6.0         survival_3.5-8      
[13] RColorBrewer_1.1-3   statmod_1.5.0        ggrepel_0.9.5       
[16] edgeR_4.0.16         limma_3.58.1         reshape2_1.4.4      
[19] openxlsx_4.2.5.2     lubridate_1.9.3      forcats_1.0.0       
[22] stringr_1.5.1        dplyr_1.1.4          purrr_1.0.2         
[25] readr_2.1.5          tidyr_1.3.1          tibble_3.2.1        
[28] ggplot2_3.5.0        tidyverse_2.0.0      data.table_1.15.4   
[31] optparse_1.7.5       monoceRos_1.0.5     

loaded via a namespace (and not attached):
  [1] splines_4.3.2               pbdZMQ_0.3-11              
  [3] BiocIO_1.12.0               bitops_1.0-7               
  [5] filelock_1.0.3              graph_1.80.0               
  [7] XML_3.99-0.16.1             rpart_4.1.23               
  [9] lifecycle_1.0.4             rstatix_0.7.2              
 [11] ensembldb_2.26.0            lattice_0.22-6             
 [13] backports_1.4.1             magrittr_2.0.3             
 [15] Hmisc_5.1-2                 rmarkdown_2.26             
 [17] plotrix_3.8-4               yaml_2.3.8                 
 [19] zip_2.3.1                   Gviz_1.46.1                
 [21] DBI_1.2.2                   abind_1.4-5                
 [23] zlibbioc_1.48.2             AnnotationFilter_1.26.0    
 [25] biovizBase_1.50.0           RCurl_1.98-1.14            
 [27] nnet_7.3-19                 VariantAnnotation_1.48.1   
 [29] rappdirs_0.3.3              GenomeInfoDbData_1.2.11    
 [31] KMsurv_0.1-5                grImport_0.9-7             
 [33] codetools_0.2-20            getopt_1.20.4              
 [35] DelayedArray_0.28.0         xml2_1.3.6                 
 [37] DNAcopy_1.76.0              tidyselect_1.2.1           
 [39] matrixStats_1.3.0           BiocFileCache_2.10.2       
 [41] base64enc_0.1-3             GenomicAlignments_1.38.2   
 [43] jsonlite_1.8.8              Formula_1.2-5              
 [45] tools_4.3.2                 progress_1.2.3             
 [47] strawr_0.0.91               Rcpp_1.0.13                
 [49] glue_1.7.0                  gridExtra_2.3              
 [51] SparseArray_1.2.4           xfun_0.43                  
 [53] MatrixGenerics_1.14.0       IRdisplay_1.1              
 [55] withr_3.0.0                 fastmap_1.1.1              
 [57] rhdf5filters_1.14.1         latticeExtra_0.6-30        
 [59] fansi_1.0.6                 digest_0.6.35              
 [61] timechange_0.3.0            R6_2.5.1                   
 [63] colorspace_2.1-0            Cairo_1.6-2                
 [65] jpeg_0.1-10                 dichromat_2.0-0.1          
 [67] biomaRt_2.58.2              RSQLite_2.3.6              
 [69] utf8_1.2.4                  generics_0.1.3             
 [71] rtracklayer_1.62.0          InteractionSet_1.30.0      
 [73] prettyunits_1.2.0           httr_1.4.7                 
 [75] htmlwidgets_1.6.4           S4Arrays_1.2.1             
 [77] pkgconfig_2.0.3             gtable_0.3.4               
 [79] blob_1.2.4                  XVector_0.42.0             
 [81] survMisc_0.5.6              htmltools_0.5.8.1          
 [83] carData_3.0-5               ProtGenerics_1.34.0        
 [85] scales_1.3.0                Biobase_2.62.0             
 [87] png_0.1-8                   knitr_1.46                 
 [89] km.ci_0.5-6                 rstudioapi_0.16.0          
 [91] tzdb_0.4.0                  rjson_0.2.21               
 [93] uuid_1.2-0                  checkmate_2.3.1            
 [95] curl_5.2.1                  rhdf5_2.46.1               
 [97] repr_1.1.7                  cachem_1.0.8               
 [99] zoo_1.8-12                  parallel_4.3.2             
[101] foreign_0.8-86              AnnotationDbi_1.64.1       
[103] restfulr_0.0.15             pillar_1.9.0               
[105] vctrs_0.6.5                 car_3.1-2                  
[107] dbplyr_2.5.0                xtable_1.8-4               
[109] cluster_2.1.6               htmlTable_2.4.2            
[111] Rgraphviz_2.46.0            evaluate_0.23              
[113] GenomicFeatures_1.54.4      cli_3.6.2                  
[115] locfit_1.5-9.9              compiler_4.3.2             
[117] Rsamtools_2.18.0            rlang_1.1.3                
[119] crayon_1.5.2                ggsignif_0.6.4             
[121] interp_1.1-6                getPass_0.2-4              
[123] plyr_1.8.9                  stringi_1.8.3              
[125] deldir_2.0-4                BiocParallel_1.36.0        
[127] munsell_0.5.1               Biostrings_2.70.3          
[129] lazyeval_0.2.2              Matrix_1.6-5               
[131] IRkernel_1.3.2              BSgenome_1.70.2            
[133] hms_1.1.3                   bit64_4.0.5                
[135] Rhdf5lib_1.24.2             KEGGREST_1.42.0            
[137] SummarizedExperiment_1.32.0 broom_1.0.5                
[139] memoise_2.0.1               bit_4.0.5                 
PoisonAlien commented 1 month ago

Hi,

Thank you for the files. I have fixed the issue. You should be able to define your own color codes for each continuoius variable as well.

Just mention any of the sequetial color codes from RcolorBrewer package and it should do the trick.

oncoplot(
  maf = maf_object,
  removeNonMutated = FALSE,
  fill = TRUE,
  clinicalFeatures = c('Treatment_Duration', 'Treatment_Group', 'Response', 'Volume_Change', 'Gender'),
  sortByAnnotation = T,
  anno_height = 3,
  annotationColor = list(Gender = c("M" = "black", 'F' = "pink"),
    Treatment_Group = c("Treatment1" = "royalblue", "Treatment2" = "maroon"),
    Treatment_Duration = "Blues", Response = "Reds",Volume_Change = "Purples"),
  annoBorderCol = 'black')

If not provided, it will randomly select from the available pallets.

Please let me know if this fixes the issue.

juferban commented 1 month ago

Thanks a lot for the quick fix. Really appreciate it.

I will give it I try on my analysis and will report back if still having any issues.

Thanks again,

Julio

Zhongqige commented 1 month ago

I had similar issues. I think the issue happened when sampleOrder is applied, then the continuous clinical feature did not match the ordered samples.

PoisonAlien commented 1 month ago

Hi @Zhongqige ,

This is fixed in the recent commit. Could you please try a fresh installation from GitHub and let me know if it works?

BiocManager::install("PoisonAlien/maftools")
Zhongqige commented 1 month ago

tcga_test_w_sampleOrder.pdf tcga_test_wo_sampleOrder.pdf Hi, Thanks for the quick response! However, I just tested, using @biosunsci tcga data, and attached result with and without the parameter sampleOrder = sorted_samples, seems still the same sample got different Response value.

PoisonAlien commented 1 month ago

Hi @Zhongqige ,

I have trouble reproducing the issue. The function respects the sample order and the corresponding variables. Could you maybe post the complete set of commands that you used? Please make sure that you have updated the package from GitHub and restarted your R session to make changes.

Zhongqige commented 1 month ago

@PoisonAlien I did install the latest version 2.21.1 and restarted my R session, and below is my command (Basically using @juferban):

## Load MAF files
maf_object = read.maf(maf = "./oncoplots_examples/mutations_filtered.maf", 
                      clinicalData = "./oncoplots_examples/sample_annot_for_maf.txt", isTCGA = FALSE)

## Make sure the continuous variables are shows as continuous
maf_object@clinical.data$Response = as.numeric(maf_object@clinical.data$Response)
maf_object@clinical.data$Volume_Change = as.numeric(maf_object@clinical.data$Volume_Change)
maf_object@clinical.data$Treatment_Duration = as.numeric(maf_object@clinical.data$Treatment_Duration)

# Sort the clinical data by multiple variables, as I want to make sure I use my predefined sample sorting
sorted_clinical_data <- maf_object@clinical.data[order(
  maf_object@clinical.data$Gender,
  maf_object@clinical.data$Treatment_Group,

  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na(maf_object@clinical.data$Response ), 1000, as.numeric(maf_object@clinical.data$Response ))),

  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na(maf_object@clinical.data$Volume_Change ), 1000, as.numeric(maf_object@clinical.data$Volume_Change ))),

  # Handle NAs: NA values are set to 1000 so they appear first
  ifelse(is.na(maf_object@clinical.data$Treatment_Duration), 1000, as.numeric(maf_object@clinical.data$Treatment_Duration))
), ]

maf_object@clinical.data <- sorted_clinical_data

# Extract the sorted sample names
sorted_samples <- sorted_clinical_data$Tumor_Sample_Barcode

pdf("./tcga_test_wo_sampleOrder.pdf", 12, 8)
oncoplot(maf = maf_object, 
         removeNonMutated = FALSE, 
         fill = TRUE, 
         clinicalFeatures = c('Gender','Treatment_Group','Response'), #, 'Volume_Change','Treatment_Duration'
         #sampleOrder = sorted_samples,
         annotationColor = list(Gender = c("F" = "deeppink", "M" = "dodgerblue"),
                                Treatment_Group = c("Treatment1" = "salmon", "Treatment2" = "yellowgreen"),
                                Response = "Blues"

         ),
         showTitle = TRUE,
         titleFontSize = 1.5,
         legendFontSize = 1,
         annotationFontSize = 1,
         SampleNamefontSize = 0.5,
         fontSize = 0.7,
         showTumorSampleBarcodes = TRUE,
         barcode_mar = 3,
         gene_mar = 5,
         legend_height = 4,
         anno_height = 1.5,
         annoBorderCol = "white",
         drawRowBar = TRUE,
         genesToIgnore = 'KRAS',
         numericAnnoCol = TRUE,
         showPct = TRUE,
         rightBarLims = c(0, 100),
         leftBarLims = c(0, 100),
)
dev.off()

pdf("./tcga_test_w_sampleOrder.pdf", 12, 8)
oncoplot(maf = maf_object, 
         removeNonMutated = FALSE, 
         fill = TRUE, 
         clinicalFeatures = c('Gender','Treatment_Group','Response'), #, 'Volume_Change','Treatment_Duration'
         sampleOrder = sorted_samples,
         annotationColor = list(Gender = c("F" = "deeppink", "M" = "dodgerblue"),
                                Treatment_Group = c("Treatment1" = "salmon", "Treatment2" = "yellowgreen"),
                                Response = "Blues"

         ),
         showTitle = TRUE,
         titleFontSize = 1.5,
         legendFontSize = 1,
         annotationFontSize = 1,
         SampleNamefontSize = 0.5,
         fontSize = 0.7,
         showTumorSampleBarcodes = TRUE,
         barcode_mar = 3,
         gene_mar = 5,
         legend_height = 4,
         anno_height = 1.5,
         annoBorderCol = "white",
         drawRowBar = TRUE,
         genesToIgnore = 'KRAS',
         numericAnnoCol = TRUE,
         showPct = TRUE,
         rightBarLims = c(0, 100),
         leftBarLims = c(0, 100),
)
dev.off()

> sessionInfo()
R version 4.2.2 (2022-10-31)
Platform: aarch64-apple-darwin20 (64-bit)
Running under: macOS Ventura 13.7

Matrix products: default
LAPACK: /Library/Frameworks/R.framework/Versions/4.2-arm64/Resources/lib/libRlapack.dylib

locale:
[1] en_US.UTF-8/en_US.UTF-8/en_US.UTF-8/C/en_US.UTF-8/en_US.UTF-8

attached base packages:
[1] stats     graphics  grDevices utils     datasets  methods   base     

other attached packages:
[1] maftools_2.21.1

loaded via a namespace (and not attached):
 [1] DNAcopy_1.72.3      rstudioapi_0.14     magrittr_2.0.3      splines_4.2.2       tidyselect_1.2.0   
 [6] lattice_0.20-45     R6_2.5.1            rlang_1.0.6         fansi_1.0.4         dplyr_1.1.0        
[11] tools_4.2.2         grid_4.2.2          pkgbuild_1.4.0      data.table_1.14.6   utf8_1.2.3         
[16] cli_3.6.0           withr_2.5.0         remotes_2.5.0       survival_3.4-0      rprojroot_2.0.3    
[21] tibble_3.1.8        lifecycle_1.0.3     crayon_1.5.2        Matrix_1.5-3        processx_3.8.0     
[26] BiocManager_1.30.19 RColorBrewer_1.1-3  callr_3.7.3         vctrs_0.5.2         ps_1.7.2           
[31] curl_5.0.0          glue_1.6.2          compiler_4.2.2      pillar_1.8.1        desc_1.4.2         
[36] generics_0.1.3      prettyunits_1.1.1   pkgconfig_2.0.3   
juferban commented 2 weeks ago

@PoisonAlien

Hi, Sorry for my delay with additional testing. I am having the same issue as reported by @Zhongqige when testing the code after the update using the BiocManager::install("PoisonAlien/maftools"). The samples are still getting the colors assigned in a somehow random way even though the order is correct.

PoisonAlien commented 1 week ago

Hello all!

Sorry for the delay. It took a while to figure out the issue. It turns out that just the colors were flipped. I have fixed it. Please install it from GitHub for changes.