factorplot
is an R package that helps visualize pairwise comparisons.
It is particularly useful as a post-estimation technique following a
(G)LM, multinomial logistic regression or any other multiple comparison
procedure done with
multcomp. A more thorough
discussion of the method and what it produces are
here.
The basic idea is that often when variables with multiple categories are
used in regression models, we want to know not only the difference
between each category and the reference group, but we want to know the
differences in all possible pairs of values of that variable. There are
lots of methods out there for doing just that, but I developed the
factorplot
package to be simple to decode relative to its competitors.
Here are a few examples of where the factorplot
package could be
useful.
To show how the factorplot
function works for (G)LM coefficients, we
are going to use the Ornstein
data from the carData
package. Here
the outcome is interlocking firm memberships and the independent
variables are the log of assets, the nation in which the firm resides
and the sector in which the firm operates.
library(factorplot)
data(Ornstein, package="carData")
mod <- glm(interlocks ~ log(assets) + sector + nation,
data=Ornstein, family=poisson)
summary(mod)
#>
#> Call:
#> glm(formula = interlocks ~ log(assets) + sector + nation, family = poisson,
#> data = Ornstein)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -6.7111 -2.3159 -0.4595 1.2824 6.2849
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -0.83938 0.13664 -6.143 8.09e-10 ***
#> log(assets) 0.45145 0.01698 26.585 < 2e-16 ***
#> sectorBNK -0.16651 0.09575 -1.739 0.082036 .
#> sectorCON -0.48928 0.21320 -2.295 0.021736 *
#> sectorFIN -0.11161 0.07571 -1.474 0.140457
#> sectorHLD -0.01491 0.11924 -0.125 0.900508
#> sectorMAN 0.12187 0.07614 1.600 0.109489
#> sectorMER 0.06157 0.08670 0.710 0.477601
#> sectorMIN 0.24985 0.06888 3.627 0.000286 ***
#> sectorTRN 0.15181 0.07893 1.923 0.054453 .
#> sectorWOD 0.49825 0.07560 6.590 4.39e-11 ***
#> nationOTH -0.10699 0.07438 -1.438 0.150301
#> nationUK -0.38722 0.08951 -4.326 1.52e-05 ***
#> nationUS -0.77239 0.04963 -15.562 < 2e-16 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for poisson family taken to be 1)
#>
#> Null deviance: 3737.0 on 247 degrees of freedom
#> Residual deviance: 1547.1 on 234 degrees of freedom
#> AIC: 2473.1
#>
#> Number of Fisher Scoring iterations: 5
The factorplot
function initially produces no output, but calculates
all of the pairwise comparisons.
f <- factorplot(mod, factor.var="sector")
There are print, summary and plotting methods for the factorplot
object. The print method has an optional argument sig
, which if set to
TRUE
only prints significant differences.
print(f, digits=3, sig=TRUE)
#> Difference SE p.val
#> AGR - CON 0.489 0.213 0.023
#> CON - HLD -0.474 0.235 0.045
#> BNK - MAN -0.288 0.102 0.005
#> CON - MAN -0.611 0.215 0.005
#> FIN - MAN -0.233 0.082 0.005
#> BNK - MER -0.228 0.106 0.032
#> CON - MER -0.551 0.220 0.013
#> AGR - MIN -0.250 0.069 0.000
#> BNK - MIN -0.416 0.084 0.000
#> CON - MIN -0.739 0.210 0.001
#> FIN - MIN -0.361 0.067 0.000
#> HLD - MIN -0.265 0.118 0.026
#> MER - MIN -0.188 0.085 0.029
#> BNK - TRN -0.318 0.082 0.000
#> CON - TRN -0.641 0.217 0.004
#> FIN - TRN -0.263 0.070 0.000
#> AGR - WOD -0.498 0.076 0.000
#> BNK - WOD -0.665 0.095 0.000
#> CON - WOD -0.988 0.215 0.000
#> FIN - WOD -0.610 0.077 0.000
#> HLD - WOD -0.513 0.121 0.000
#> MAN - WOD -0.376 0.080 0.000
#> MER - WOD -0.437 0.090 0.000
#> MIN - WOD -0.248 0.072 0.001
#> TRN - WOD -0.346 0.081 0.000
The summary method identifies the number of significant positive, significant negative and insignificant differences with other stimuli.
summary(f)
#> sig+ sig- insig
#> AGR 1 2 6
#> BNK 0 5 4
#> CON 0 7 2
#> FIN 0 4 5
#> HLD 1 2 6
#> MAN 3 1 5
#> MER 2 2 5
#> MIN 6 1 2
#> TRN 3 1 5
#> WOD 9 0 0
The plotting method colors each box according to whether and how each
difference is significant. For example, the dark-gray box at the nexus
of the AGR
and CON
row and column, respectively, indicates that the
agriculture sector has significantly more interlocking board memberships
than does the construction sector. The light-gray boxes indicate that
the difference is significant in favor of the column rather than the
row.
plot(f)
The default means for adjusting for multiplicity is to do nothing, but
all of the options of p.adjust
are available.
f <- factorplot(mod, factor.var="sector", adjust.method="holm")
plot(f)
For those using the glht()
function from multcomp
, the factorplot
function works for these objects, too.
g <- glht(mod, linfct = mcp("sector" = "Tukey"))
s <- summary(g)
plot(factorplot(s))
One of the biggest problems with interpreting multinomial logit
coefficients is that the reference category problem applies to all
variables because it is a function not only of potentially some
independent variables, but also of the dependent variable. Here the
factorplot
function can help.
data("Chile", package="carData")
library(nnet)
mod <- multinom(vote ~ age + sex + education, data=Chile)
#> # weights: 24 (15 variable)
#> initial value 3494.848084
#> iter 10 value 3121.795602
#> iter 20 value 3077.986718
#> final value 3077.985852
#> converged
summary(mod)
#> Call:
#> multinom(formula = vote ~ age + sex + education, data = Chile)
#>
#> Coefficients:
#> (Intercept) age sexM educationPS educationS
#> N 1.0928913 0.00601010 0.58599491 0.3288071 -0.2398034
#> U 0.9496712 0.02041244 -0.24447512 -1.0294778 -0.6974096
#> Y 1.1850035 0.02082591 0.01426536 -0.4851798 -0.7827258
#>
#> Std. Errors:
#> (Intercept) age sexM educationPS educationS
#> N 0.3088481 0.006303204 0.1628414 0.2501653 0.1976728
#> U 0.3166930 0.006485748 0.1717826 0.2778410 0.2012233
#> Y 0.3059721 0.006264887 0.1640342 0.2536581 0.1954040
#>
#> Residual Deviance: 6155.972
#> AIC: 6185.972
In the above, the effect of age is the effect of sexM
is the effect on
the binary choice between the reference (Abstain) and each non-reference
level. We can see the effects for all pairs of levels with factorplot
.
f <- factorplot(mod, variable="sexM")
plot(f)
Notice that the factorplot shows that all of the differences between
non-reference categories are statistically significant. As shown above,
one way to specify the factorplot
for multinomial logit objects is to
give the name of a single regressor. You can also provide the name of a
term, for example education
, which has multiple regressors in the
model. Here, there is a single heading for Reference
. The reference
here stands in for not only the reference group of education
in the
non-reference dependent variable categories, but also for all education
categories in the dependent variable reference category. This makes
sense because all of these parameters are set to 0 for identification
purposes. Here is the plot for education:
f <- factorplot(mod, variable="education")
plot(f)
The default method for factorplot
will also take as the argument to
obj
a vector of estimates and a variance-covariance matrix of the
estimates as the var
argument. Using this method, you could plot
factorplots for any estimates you might want.