Open rjverheijden opened 2 years ago
Reprex:
plot_groups <- function(data, vrb = "all", label = FALSE, square = TRUE, diagonal = FALSE, rotate = FALSE, useNA = "no") {
if (!is.data.frame(data) & !is.matrix(data)) {
stop("Dataset should be a 'data.frame' or 'matrix'.")
}
if (vrb[1] == "all") {
vrb <- names(data)
}
groups <- data.frame(vrb=NA, lvl=NA)
for(i in vrb){
if("factor" %in% class(data[,i])){
tmp <- data.frame(vrb=rep(i,times=length(levels(data[,i]))),lvl=levels(data[,i]))
} else{
tmp <- data.frame(vrb=i,lvl=NA)
}
tmp <- rbind(tmp, cbind(vrb=i, lvl="NA"))
groups <- rbind(groups,tmp)
}
groups<-groups[-1,]
groups_compl <- data.frame(groups[rep(seq_len(nrow(groups)), each = nrow(groups)), ],
groups, compl=NA, x=NA, y=NA)
for(i in 1:nrow(groups_compl)){
if(groups_compl[i,"vrb"]==groups_compl[i,"vrb.1"]){
groups_compl[i,"compl"] <- NA
}else if(is.na(groups_compl[i,"lvl"]) & is.na(groups_compl[i,"lvl.1"])){
groups_compl[i,"compl"] <- nrow(data[!is.na(data[,groups_compl[i,"vrb"]]) & !is.na(data[,groups_compl[i,"vrb.1"]]),])
}else if (is.na(groups_compl[i,"lvl"])&groups_compl[i,"lvl.1"]=="NA"){
groups_compl[i,"compl"] <- nrow(data[!is.na(data[,groups_compl[i,"vrb"]]) & is.na(data[,groups_compl[i,"vrb.1"]]),])
}else if (groups_compl[i,"lvl"]=="NA" & is.na(groups_compl[i,"lvl.1"])){
groups_compl[i,"compl"] <- nrow(data[is.na(data[,groups_compl[i,"vrb"]]) & !is.na(data[,groups_compl[i,"vrb.1"]]),])
}else if(is.na(groups_compl[i,"lvl"])){
groups_compl[i,"compl"] <- nrow(data[!is.na(data[,groups_compl[i,"vrb"]]) & data[,groups_compl[i,"vrb.1"]]%in%groups_compl[i,"lvl.1"],])
}else if(is.na(groups_compl[i,"lvl.1"])){
groups_compl[i,"compl"]<- nrow(data[data[,groups_compl[i,"vrb"]]%in%groups_compl[i,"lvl"] & !is.na(data[,groups_compl[i,"vrb.1"]]),])
}else if(groups_compl[i,"lvl"]=="NA"){
groups_compl[i,"compl"]<- nrow(data[is.na(data[,groups_compl[i,"vrb"]]) & data[,groups_compl[i,"vrb.1"]]%in%groups_compl[i,"lvl.1"],])
}else if(groups_compl[i,"lvl.1"]=="NA"){
groups_compl[i,"compl"]<- nrow(data[data[,groups_compl[i,"vrb"]]%in%groups_compl[i,"lvl"] & is.na(data[,groups_compl[i,"vrb.1"]]) ,])
}else{
groups_compl[i,"compl"]<- nrow(data[data[,groups_compl[i,"vrb"]]%in%groups_compl[i,"lvl"] & data[,groups_compl[i,"vrb.1"]]%in%groups_compl[i,"lvl.1"],])
}
groups_compl[i,"y"]<-ifelse(is.na(groups_compl[i,"lvl.1"]), paste0(groups_compl[i,"vrb.1"]),paste0(groups_compl[i,"vrb.1"], " | ", groups_compl[i,"lvl.1"]))
groups_compl[i,"x"]<-ifelse(is.na(groups_compl[i,"lvl"]), paste0(groups_compl[i,"vrb"]),paste0(groups_compl[i,"vrb"], " | ", groups_compl[i,"lvl"]))
}
groups_compl[groups_compl$compl%in%0,"compl"]<-0.1
if(useNA=="both"){
groups_plot<-groups_compl
} else if(useNA=="only"){
groups_plot <- groups_compl[groups_compl$lvl%in%"NA",]
} else{
groups_plot <- groups_compl[(is.na(groups_compl$lvl)|groups_compl$lvl!="NA")&(is.na(groups_compl$lvl.1)|groups_compl$lvl.1!="NA"),]
}
gg <- ggplot2::ggplot(groups_plot, ggplot2::aes(x = .data$x, y = .data$y, label = .data$compl, fill = .data$compl)) +
ggplot2::geom_tile(color = "black") +
ggplot2::scale_x_discrete(limits = unique(groups_plot$x), position = "top") +
ggplot2::scale_y_discrete(limits = rev(unique(groups_plot$y))) +
# ggplot2::scale_fill_viridis_c("number\nof\ncases", option = "viridis", na.value = "white", direction = -1, trans = "log10")+
# ggplot2::scale_fill_viridis_b("number\nof\ncases", option = "viridis", na.value = "white", direction = -1, trans = "log10", breaks = c(10,100,1000))+
ggplot2::scale_fill_gradient2("number\nof\ncases", low = "orangered", mid = "lightyellow", high = "deepskyblue", midpoint = log10(10), na.value = "grey50", trans = "log10") +
ggplot2::labs(title = "Number of cases per category")
if (label) {
gg <- gg + ggplot2::geom_text(color = "black", show.legend = FALSE, na.rm = TRUE)
}
if (square) {
gg <- gg + ggplot2::coord_fixed(expand = FALSE)
} else {
gg <- gg + ggplot2::coord_cartesian(expand = FALSE)
}
if (rotate) {
gg <- gg + ggplot2::theme(axis.text.x = ggplot2::element_text(angle = 90))
}
return(gg)
}
plot_groups(mice::boys, rotate = T, useNA = "no")
Created on 2023-06-27 with reprex v2.0.2
Hi @rjverheijden, sorry for the late response. Thank you very much for your contribution. I'm not adding your function in the current form to the ggmice
package. I will be developing a function to visualize the missingness proportion per variable (#66), in the style/color scheme of mice
. If you create a pull request with this function, I can use parts of your code as inspiration to visualize categorical variables split by groups and automatically provide credit to you. Otherwise I'll just add you to the acknowledgements.
Hi @hanneoberman, great that you are developing this function! I have now created a pull request (I hope). Let me know if something went wrong and feel free to deny the pull request if something went wrong and just use the function as you wish. As long as other's can make use of it, it is fine to me.
Alternative: add optional argument to plot_corr etc. to use model.matrix
instead of variables.
Dear
ggmice
contributers, thanks a lot for the usefull package. Based on theplot_corr
, I made a function to visualise the number of cases per level compared to other variables (a sort of contigency table for all desired variables in heatmap format). Especially with missing data and handling these withmice
orsmcfcs
it might come in handy. I am curious what your thoughts are, there are for sure ways to make it more efficient. Please feel free to modify it and (if you like it) implement it in the package.plot_groups function
With this function, you can quickly check the number op subjects per level for categorical variables and in total for continuous data. It creates a heat map in which red means little subjects per group and blue numerous subjects per group. Note that groups with
0
were set to0.1
because of the log10 transfromation, so groups with value0.1
represent groups with0
subjects. Options foruseNA
are"no"
to exclude them from the plot,"both"
to include both variables and missings and"only"
to only include the missings.Example
This function was based on the
plot_cor()
function of packageggmice
v0.0.1 available on https://github.com/amices/ggmice which was licensed under the GNU General Public Licence v3.0.Hence this function is also licensed under the GNU GPLv3.0.