PoisonAlien / maftools

Summarize, Analyze and Visualize MAF files from TCGA or in-house studies.
http://bioconductor.org/packages/release/bioc/html/maftools.html
MIT License
442 stars 218 forks source link

Error applying colors to continuous clinicalFeatures #1050

Open juferban opened 1 week ago

juferban commented 1 week ago

Describe the issue Hello,

I am having an issue trying to add continuous clinical features to my oncoplot.

If I add more than one clinicalFeature that has continuous values the colors applied to the values seem to be mixed and not match the values they are supposed to represent.

More specifically, I had an oncoplot where I wanted to add to clinical features that represent to different way to measure response. If I only add one of the features to the plot, the color gradient applies correctly but if I add both clinical Features to the plots, most samples show the correct colors but random samples show colors that don't match. In my test, I specified the sample order using the sampleOrder variable in the oncoplot command and the sample order corresponded to the first clinical feature so the gradient should show from lowest to highest (which correctly does when only adding that first clinical Feature to the oncoplot). As soon as I add the second clinical feature some samples get a random color assigned.

The command do not throw any error.

Thanks for a great package!.

Command oncoplot(maf = maf_object, removeNonMutated = FALSE, fill = TRUE, clinicalFeatures = c('Treatment_Group','Response_IRC','Treatment_Duration'), sampleOrder = sorted_samples, showTitle = TRUE, titleFontSize = 1.5, legendFontSize = 1, annotationFontSize = 1, SampleNamefontSize = 0.7, fontSize = 0.7, showTumorSampleBarcodes = TRUE, barcode_mar = 4, gene_mar = 6, legend_height = 4, anno_height = 1.5, annoBorderCol = "white", annotationColor = annotationColor, )

Session info R version 4.3.2 (2023-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.4 LTS

Matrix products: default BLAS: /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRblas.so LAPACK: /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRlapack.so; LAPACK version 3.11.0

locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

time zone: Etc/UTC tzcode source: system (glibc)

attached base packages: [1] grid stats graphics grDevices utils datasets methods
[8] base

other attached packages: [1] circlize_0.4.16 maftools_2.18.0 RColorBrewer_1.1-3 statmod_1.5.0
[5] ggrepel_0.9.5 edgeR_4.0.16 limma_3.58.1 reshape2_1.4.4
[9] openxlsx_4.2.5.2 lubridate_1.9.3 forcats_1.0.0 stringr_1.5.1
[13] dplyr_1.1.4 purrr_1.0.2 readr_2.1.5 tidyr_1.3.1
[17] tibble_3.2.1 ggplot2_3.5.0 tidyverse_2.0.0 data.table_1.15.4 [21] optparse_1.7.5 monoceRos_1.0.5

loaded via a namespace (and not attached): [1] gtable_0.3.4 shape_1.4.6.1 GlobalOptions_0.1.2 [4] lattice_0.22-6 tzdb_0.4.0 Cairo_1.6-2
[7] vctrs_0.6.5 tools_4.3.2 generics_0.1.3
[10] getopt_1.20.4 fansi_1.0.6 pkgconfig_2.0.3
[13] Matrix_1.6-5 uuid_1.2-0 lifecycle_1.0.4
[16] compiler_4.3.2 munsell_0.5.1 repr_1.1.7
[19] getPass_0.2-4 htmltools_0.5.8.1 pillar_1.9.0
[22] crayon_1.5.2 tidyselect_1.2.1 locfit_1.5-9.9
[25] zip_2.3.1 digest_0.6.35 stringi_1.8.3
[28] splines_4.3.2 fastmap_1.1.1 colorspace_2.1-0
[31] cli_3.6.2 magrittr_2.0.3 base64enc_0.1-3
[34] survival_3.5-8 utf8_1.2.4 IRdisplay_1.1
[37] withr_3.0.0 scales_1.3.0 IRkernel_1.3.2
[40] timechange_0.3.0 pbdZMQ_0.3-11 hms_1.1.3
[43] DNAcopy_1.76.0 evaluate_0.23 rlang_1.1.3
[46] Rcpp_1.0.13 glue_1.7.0 jsonlite_1.8.8
[49] R6_2.5.1 plyr_1.8.9

biosunsci commented 5 days ago

hi @juferban, could you post some of your data which can lead to the bug to make us easy to Reproduce the bug?

juferban commented 4 days ago

Hi, Thanks for your quick response. I will generate a couple of files and will upload them so you can use them for testing. I will upload them soon.

Thanks a lot.

juferban commented 2 days ago

Hi @biosunsci

I am attaching the example files to be able to reproduce my problem. Also the code Is used for testing is as follow:

## Load MAF files
maf_object = read.maf(maf = "mutations_filtered.maf", 
                      clinicalData = "sample_annot_for_maf.txt", isTCGA = FALSE)

## Make sure the continuous variables are shows as continuous
maf_object@clinical.data$Response = as.numeric(maf_object@clinical.data$Response)
maf_object@clinical.data$Volume_Change = as.numeric(maf_object@clinical.data$Volume_Change)
maf_object@clinical.data$Treatment_Duration = as.numeric(maf_object@clinical.data$Treatment_Duration)

# Sort the clinical data by multiple variables, as I want to make sure I use my predefined sample sorting
sorted_clinical_data <- maf_object@clinical.data[order(
  maf_object@clinical.data$Gender,
  maf_object@clinical.data$Treatment_Group,

  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na(maf_object@clinical.data$Response ), 1000, as.numeric(maf_object@clinical.data$Response ))),

  # Handle NAs: NA values are set to 1000 so they appear first
  dplyr::desc(ifelse(is.na(maf_object@clinical.data$Volume_Change ), 1000, as.numeric(maf_object@clinical.data$Volume_Change ))),

  # Handle NAs: NA values are set to 1000 so they appear first
  ifelse(is.na(maf_object@clinical.data$Treatment_Duration), 1000, as.numeric(maf_object@clinical.data$Treatment_Duration))
), ]

maf_object@clinical.data <- sorted_clinical_data

# Extract the sorted sample names
sorted_samples <- sorted_clinical_data$Tumor_Sample_Barcode

## Create the oncoplot
oncoplot(maf = maf_object, 
         removeNonMutated = FALSE, 
         fill = TRUE, 
         clinicalFeatures = c('Gender','Treatment_Group','Response','Volume_Change','Treatment_Duration'),
         sampleOrder = sorted_samples,
         showTitle = TRUE,
         titleFontSize = 1.5,
         legendFontSize = 1,
         annotationFontSize = 1,
         SampleNamefontSize = 0.5,
         fontSize = 0.7,
         showTumorSampleBarcodes = TRUE,
         barcode_mar = 3,
         gene_mar = 5,
         legend_height = 4,
         anno_height = 1.5,
         annoBorderCol = "white",
         drawRowBar = TRUE,
         genesToIgnore = 'KRAS',
         numericAnnoCol = TRUE,
         showPct = TRUE,
         rightBarLims = c(0, 100),
         leftBarLims = c(0, 100),
)

If I only use the clinical variables 'Gender', 'Treatment_Group' and 'Response' with Response being the only continuous variable, the coloring is correctly applied. As soon as I incorporate the other two continuous variables the coloring gets mixed up.

Thanks a lot,

oncoplots_examples.zip

This is my session info: R version 4.3.2 (2023-10-31) Platform: x86_64-pc-linux-gnu (64-bit) Running under: Ubuntu 22.04.4 LTS

Matrix products: default BLAS: /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRblas.so LAPACK: /mnt/disks/monoceros_nfs/software/R-4.3.2/lib/R/lib/libRlapack.so; LAPACK version 3.11.0

locale: [1] LC_CTYPE=C.UTF-8 LC_NUMERIC=C LC_TIME=C.UTF-8
[4] LC_COLLATE=C.UTF-8 LC_MONETARY=C.UTF-8 LC_MESSAGES=C.UTF-8
[7] LC_PAPER=C.UTF-8 LC_NAME=C LC_ADDRESS=C
[10] LC_TELEPHONE=C LC_MEASUREMENT=C.UTF-8 LC_IDENTIFICATION=C

time zone: Etc/UTC tzcode source: system (glibc)

attached base packages: [1] grid stats4 stats graphics grDevices utils datasets [8] methods base

other attached packages: [1] pROC_1.18.5 trackViewer_1.38.2 GenomicRanges_1.54.1 [4] GenomeInfoDb_1.38.8 IRanges_2.36.0 S4Vectors_0.40.2
[7] BiocGenerics_0.48.1 maftools_2.18.0 pheatmap_1.0.12
[10] survminer_0.4.9 ggpubr_0.6.0 survival_3.5-8
[13] RColorBrewer_1.1-3 statmod_1.5.0 ggrepel_0.9.5
[16] edgeR_4.0.16 limma_3.58.1 reshape2_1.4.4
[19] openxlsx_4.2.5.2 lubridate_1.9.3 forcats_1.0.0
[22] stringr_1.5.1 dplyr_1.1.4 purrr_1.0.2
[25] readr_2.1.5 tidyr_1.3.1 tibble_3.2.1
[28] ggplot2_3.5.0 tidyverse_2.0.0 data.table_1.15.4
[31] optparse_1.7.5 monoceRos_1.0.5

loaded via a namespace (and not attached): [1] splines_4.3.2 pbdZMQ_0.3-11
[3] BiocIO_1.12.0 bitops_1.0-7
[5] filelock_1.0.3 graph_1.80.0
[7] XML_3.99-0.16.1 rpart_4.1.23
[9] lifecycle_1.0.4 rstatix_0.7.2
[11] ensembldb_2.26.0 lattice_0.22-6
[13] backports_1.4.1 magrittr_2.0.3
[15] Hmisc_5.1-2 rmarkdown_2.26
[17] plotrix_3.8-4 yaml_2.3.8
[19] zip_2.3.1 Gviz_1.46.1
[21] DBI_1.2.2 abind_1.4-5
[23] zlibbioc_1.48.2 AnnotationFilter_1.26.0
[25] biovizBase_1.50.0 RCurl_1.98-1.14
[27] nnet_7.3-19 VariantAnnotation_1.48.1
[29] rappdirs_0.3.3 GenomeInfoDbData_1.2.11
[31] KMsurv_0.1-5 grImport_0.9-7
[33] codetools_0.2-20 getopt_1.20.4
[35] DelayedArray_0.28.0 xml2_1.3.6
[37] DNAcopy_1.76.0 tidyselect_1.2.1
[39] matrixStats_1.3.0 BiocFileCache_2.10.2
[41] base64enc_0.1-3 GenomicAlignments_1.38.2
[43] jsonlite_1.8.8 Formula_1.2-5
[45] tools_4.3.2 progress_1.2.3
[47] strawr_0.0.91 Rcpp_1.0.13
[49] glue_1.7.0 gridExtra_2.3
[51] SparseArray_1.2.4 xfun_0.43
[53] MatrixGenerics_1.14.0 IRdisplay_1.1
[55] withr_3.0.0 fastmap_1.1.1
[57] rhdf5filters_1.14.1 latticeExtra_0.6-30
[59] fansi_1.0.6 digest_0.6.35
[61] timechange_0.3.0 R6_2.5.1
[63] colorspace_2.1-0 Cairo_1.6-2
[65] jpeg_0.1-10 dichromat_2.0-0.1
[67] biomaRt_2.58.2 RSQLite_2.3.6
[69] utf8_1.2.4 generics_0.1.3
[71] rtracklayer_1.62.0 InteractionSet_1.30.0
[73] prettyunits_1.2.0 httr_1.4.7
[75] htmlwidgets_1.6.4 S4Arrays_1.2.1
[77] pkgconfig_2.0.3 gtable_0.3.4
[79] blob_1.2.4 XVector_0.42.0
[81] survMisc_0.5.6 htmltools_0.5.8.1
[83] carData_3.0-5 ProtGenerics_1.34.0
[85] scales_1.3.0 Biobase_2.62.0
[87] png_0.1-8 knitr_1.46
[89] km.ci_0.5-6 rstudioapi_0.16.0
[91] tzdb_0.4.0 rjson_0.2.21
[93] uuid_1.2-0 checkmate_2.3.1
[95] curl_5.2.1 rhdf5_2.46.1
[97] repr_1.1.7 cachem_1.0.8
[99] zoo_1.8-12 parallel_4.3.2
[101] foreign_0.8-86 AnnotationDbi_1.64.1
[103] restfulr_0.0.15 pillar_1.9.0
[105] vctrs_0.6.5 car_3.1-2
[107] dbplyr_2.5.0 xtable_1.8-4
[109] cluster_2.1.6 htmlTable_2.4.2
[111] Rgraphviz_2.46.0 evaluate_0.23
[113] GenomicFeatures_1.54.4 cli_3.6.2
[115] locfit_1.5-9.9 compiler_4.3.2
[117] Rsamtools_2.18.0 rlang_1.1.3
[119] crayon_1.5.2 ggsignif_0.6.4
[121] interp_1.1-6 getPass_0.2-4
[123] plyr_1.8.9 stringi_1.8.3
[125] deldir_2.0-4 BiocParallel_1.36.0
[127] munsell_0.5.1 Biostrings_2.70.3
[129] lazyeval_0.2.2 Matrix_1.6-5
[131] IRkernel_1.3.2 BSgenome_1.70.2
[133] hms_1.1.3 bit64_4.0.5
[135] Rhdf5lib_1.24.2 KEGGREST_1.42.0
[137] SummarizedExperiment_1.32.0 broom_1.0.5
[139] memoise_2.0.1 bit_4.0.5

PoisonAlien commented 1 day ago

Hi,

Thank you for the files. I have fixed the issue. You should be able to define your own color codes for each continuoius variable as well.

Just mention any of the sequetial color codes from RcolorBrewer package and it should do the trick.

oncoplot(
  maf = maf_object,
  removeNonMutated = FALSE,
  fill = TRUE,
  clinicalFeatures = c('Treatment_Duration', 'Treatment_Group', 'Response', 'Volume_Change', 'Gender'),
  sortByAnnotation = T,
  anno_height = 3,
  annotationColor = list(Gender = c("M" = "black", 'F' = "pink"),
    Treatment_Group = c("Treatment1" = "royalblue", "Treatment2" = "maroon"),
    Treatment_Duration = "Blues", Response = "Reds",Volume_Change = "Purples"),
  annoBorderCol = 'black')

If not provided, it will randomly select from the available pallets.

Please let me know if this fixes the issue.