Closed ericmelse closed 1 year ago
thanks
Dear Ben,
I am exploring your violinplot
option rag
for split (binary) violin plots as well as for unary plots.
Although it works just fine there might be room for some improvement.
First, I create a split (binary) violin plot using this code:
* Setup - single outcome
sysuse nlsw88 , clear
gen ln_wage = ln(wage)
egen z_ln_wage= std(ln_wage)
* split violin
violinplot z_ln_wage, split(south, offset(.2)) box(type(l) stat(p20 p80)) ///
mean(type(m) msize(*1.5) ms(oh)) rag(mc(%10) off(.6)) ///
ytit("ln(wage/GNP deflator) (standardized)", s(*1.1) margin(l-2 r+2)) ///
xsca(r(0 5)) name(sth, replace)
Which results in: no issues with this plot but just to appreciate my next step.
Next, I use code to create a unary violin plot but together with a rag indicating the number of cases along the scale.
Note that I have added graphreg(margin(b+9))
to have the same layout as with the previous plot (i.e. for a lack of the legend).
* violin
violinplot z_ln_wage, col("40 165 204") fill box(type(l) stat(p20 p80)) ///
med(type(l)) mean(type(m) msize(*1.5) ms(oh)) rag(mc(%10) off(.6)) ///
ytit("ln(wage/GNP deflator) (standardized)", s(*1.1) margin(l-2 r+2)) ///
xsca(r(0 5)) vert name(s_left, replace) graphreg(margin(b+9))
Which results in:
Next is an alternative visualisation of the previous unary violin plot but as a 'half-violin' plot.
I am forced to create a utility variable for this (count
) that is coded 1 for all cases. Next I have to expand the data set by one case and code that to 0 in count as to be able to use it to split the violin plot. (Possibly I have not understood all options well enough to be able to do this without my trickery.)
Obviously there is no scale data (z_ln_wage
) in the single count case with the code 0.
Thus, violinplot is now 'tricked' as to visualize only one half of the violin (basically the kernel density).
The advantage is that now I can put the rag to the left of the violin, like:
* Setup - single outcome - half violinplot example
* Requires a little trick coding
gen count=1
set obs `=_N+1'
replace count=0 if count==.
violinplot z_ln_wage, sp(count) col("40 165 204") fill box(type(l) stat(p20 p80)) ///
med(type(l)) mean(type(m) msize(*1.5) ms(oh)) rag(mc(%10) off(-.1)) ///
ytit("ln(wage/GNP deflator) (standardized)", s(*1.1) margin(l-2 r+2)) ///
xsca(r(0 5)) name(s_half, replace)
Which results in:
When you agree with my approach to create such a half-violin plot, maybe you can add an option (e.g. half
) to reduce the effort of working with a facilitating binary variable by automating that in your ado file (should that now not be possible).
Maybe you wonder why I want to use such half-violin plot. I will explain that in my next post.
No need to trick split()
, as there is left
and right
to print only half of the violin (modified/added options on last line):
violinplot z_ln_wage, col("40 165 204") fill box(type(l) stat(p20 p80)) ///
med(type(l)) mean(type(m) msize(*1.5) ms(oh)) ///
ytit("ln(wage/GNP deflator) (standardized)", s(*1.1) margin(l-2 r+2)) ///
xsca(r(0 5)) vert name(s_half, replace) graphreg(margin(b+9)) ///
rag(mc(%10) off(.1)) right nowhisk
ben
Very nice Ben!
Dear Ben,
Now I provide code for an example of how to use violinplot
to create a combination of several plots of the same outcome variable as a right-violin plot of all cases together with four half-violin plots (in this example). The objective is to compare the unary distribution pattern with the binary distribution patterns. The main purpose of a regular violin plot is to compare differences in the distribution. However, using the option absolute
enables absolute scaling of each group of a binary (coded) variable. This makes it possible to create multiple violin plots of the same outcome variable and compare the distributions while being aware of group size differences between these distributions. My code below uses the help file data example to this purpose (sysuse nlsw88
). Note that I employ Stata's graph combine
command to create the end result graph of the violin plots. But be aware that the graph combine
command (in my experience) has its own rather strange way of dealing with the size and relative proportion between the various components that make up the graph (source graphs as well as the end result graph). It might be possible that I have not understood well enough how to go about coding the source graphs and what Stata's graph combine
command does to them to create the result graph. Nevertheless, I am very much 'result' driven and satisfied that my result graph meets the following criteria:
text
is coded for this purpose),note
with an interpretative explanation of graphical parts of the violin plots, strives to adhere to Edward Tufte's data-ink ratio concept and still be visually appealing,The example code:
* Setup
sysuse nlsw88 , clear
* Standardize outcome variable
gen ln_wage = ln(wage)
egen z_ln_wage= std(ln_wage)
* Apply prefix b_
foreach var in collgrad c_city south union {
ren `var' b_`var'
label values b_`var' lbl_`var'
*label var b_`var' "by `var'"
}
* Unary or right-violin plot
graph drop _all
local xtit "all cases"
violinplot z_ln_wage, fill box(type(l) stat(p20 p80)) ///
mean(type(m) msize(*1.7) ms(dh)) med(type(l)) ///
c(gs8) bc(gs15) medc(gs16) graphr(c(gs15)) vert ///
xsca(r(0 0)) ysca(noextend) ylab(-4(1)4, angle(none) glc(gs15)) ///
ytit("ln(wage/GNP deflator) (standardized)", s(*1.1) m(r+2)) ///
xtit("`xtit'", m(t+3 b+2.4)) rag(mc(%10) off(.1)) right nowhisk ///
saving(unary, replace)
* Binary or half-violin plots * Absolute scaling by groups (demographic)
local xtit "graduation"
violinplot z_ln_wage, split(b_collgrad, offset(.2)) box(type(l) stat(p20 p80)) ///
abs mean(type(m) msize(*1.7) ms(dh)) rag(mc(%10) off(.6)) ///
c("255 134 53" "1 150 239") bc(gs15) medc(gs16) graphr(c(gs15)) xsca(r(0 0)) ///
ytit("") ysca(noline) ylab(-4(1)4, nolab notick glc(gs15)) legend(off) ///
xtit("`xtit'", m(t+5.5 b+0)) text(-4.32 -.59 "not so", s(*.9) placement(e)) ///
text(-4.32 .59 "college", s(*.9) placement(w)) saving(binary1, replace)
local xtit "city"
violinplot z_ln_wage, split(b_c_city, offset(.2)) box(type(l) stat(p20 p80)) ///
abs mean(type(m) msize(*1.7) ms(dh)) rag(mc(%10) off(.6)) ///
c("teal*1" "teal*1.7") bc(gs15) medc(gs16) graphr(c(gs15)) xsca(r(0 0)) ///
ytit("") ysca(noline) ylab(-4(1)4, nolab notick glc(gs15)) legend(off) ///
xtit("`xtit'", m(t+5.5 b+0)) text(-4.32 -.59 "not so", s(*.9) placement(e)) ///
text(-4.32 .59 "lives there", s(*.9) placement(w)) saving(binary2, replace)
local xtit "south"
violinplot z_ln_wage, split(b_south, offset(.2)) box(type(l) stat(p20 p80)) ///
abs mean(type(m) msize(*1.7) ms(dh)) rag(mc(%10) off(.6)) ///
c("blue*.7" "blue*.9") bc(gs15) medc(gs16) graphr(c(gs15)) xsca(r(0 0)) ///
ytit("") ysca(noline) ylab(-4(1)4, nolab notick glc(gs15)) legend(off) ///
xtit("`xtit'", m(t+5.5 b+0)) text(-4.32 -.59 "not so", s(*.9) placement(e)) ///
text(-4.32 .59 "lives there", s(*.9) placement(w)) saving(binary3, replace)
local xtit "union"
violinplot z_ln_wage, split(b_union, offset(.2)) box(type(l) stat(p20 p80)) ///
abs mean(type(m) msize(*1.7) ms(dh)) rag(mc(%10) off(.6)) ///
c("maroon*.8" "maroon*1.5") bc(gs15) medc(gs16) graphr(c(gs15)) xsca(r(0 0)) ///
ytit("") ysca(noline) ylab(-4(1)4, nolab notick glc(gs15)) legend(off) ///
xtit("`xtit'", m(t+5.5 b+0)) text(-4.32 -.59 "not so", s(*.9) placement(e)) ///
text(-4.32 .59 "worker", s(*.9) placement(w)) saving(binary4, replace)
* Finally combine the above into the result graph
graph drop _all
graph combine "unary.gph" "binary1.gph" "binary2.gph" "binary3.gph" "binary4.gph" , col(5) ycommon ///
note("The mean is indicated by the {fontface Verdana:{&loz}} symbol in the plots. IQR lines are at the 20{sup:th}, 50{sup:th} and 80{sup:th} percentile. Binary groups are absolute. ", span s(*.8) m(l+1 t-0 b+2)) ///
graphr(c(gs15) ilc(none) m(l-2 r-2 t-2 b-4)) imargin(small)
Which results in:
One last note, about all of the above I have been corresponding earlier this year with Fernando Rios-Avila while using his package joy_plot
with which a very similar result can be achieved (but admittedly, that requires a little more code).
How beautiful indeed are the new options!