friendly / VCDR

Visualizing categorical data with R
0 stars 0 forks source link

deviation rootogram: color fill? #33

Closed friendly closed 9 years ago

friendly commented 9 years ago

Phil raised an interesting question re: deviation rootogram in Fig 3.15:

\TODO{PC: Couldn't you go one stem further, and plot a colour aesthetic for the standardized residuals for each hanging block? It's great that the (O - E) effect is captured for detecting, systematic effects, but one would probably be interested in the $\sqrt{(O-E)^2 / E}$ effect too in visual inspections. Would really only make sense for the devation rootogram plot}

Or is this gilding the lilly?

philchalmers commented 9 years ago

Perhaps an inspiring example will help? Fit the Federalist data with a poisson distribution, which has an overall bad fit in some residuals.

library(vcd)
data(Federalist)
Fed_fit1 <- goodfit(Federalist, type = "poisson")

R <- Fed_fit1$observed - Fed_fit1$fitted
z <- R / sqrt(Fed_fit1$fitted) 
sR <- -1 * sign(R) * sqrt(abs(R))
plt <- data.frame(count=Fed_fit1$count, sR, z, rootfit=sqrt(Fed_fit1$fitted))

library(ggplot2)
# roughly equivalent to vcd rootogram
ggplot(plt, aes(count, sR)) + 
    geom_bar(stat='identity', position='identity') + 
    geom_line(aes(y=rootfit), colour='blue') + 
    geom_point(aes(y=rootfit), colour='blue', size=4) +
    xlab('Number of Occurrences') + 
    ylab('sqrt(Frequency)')

# versus this, which captures residual info. Notice that the highest residual 
# is the last count, despite its sqrt(O-E) size
ggplot(plt, aes(count, sR, fill = abs(z))) +
    geom_bar(stat='identity', position='identity') + 
    geom_line(aes(y=rootfit), colour='blue') + 
    geom_point(aes(y=rootfit), colour='blue', size=4) +
    xlab('Number of Occurrences') + 
    ylab('sqrt(Frequency)') + 
    scale_fill_gradient(low='white', high='red')

rplot rplot2

friendly commented 9 years ago

That is actually very nice, because it shows clearly that the contribution to lack of fit is greatest for the largest count, as you said in your comment.

davidjohannesmeyer commented 9 years ago

Here goes a version with rootogram():

library(vcd)
data(Federalist)
fed <- goodfit(Federalist, type = "poisson")
res <- with(Fed_fit1, (observed - fitted) / sqrt(fitted))

fill <- shading_hcl(observed = fed$observed, expected = fed$fitted, df = fed$df)(res)$fill
rootogram(fed, type = "deviation", rect_gp = gpar(fill = fill), pop = FALSE)
legend_resbased(x = unit(0.8, "npc"), y = 0.4, height = 0.4)(res,shading = shading_hcl(observed = fed$observed, expected = fed$fitted, df = fed$df),"")

It is trivial to add a shade= argument which does this if TRUE and rect_gp is NULL. Proper legend handling needs some more edits, though.

friendly commented 9 years ago

This also is very nice -- In fact, I think I prefer it with a bipolar color scale. The legend is not too bad, but no one but you would know how to do that! I would like to use this in the book, because it enhances the use of rootograms, but I wouldn't want to have to explain this code. Adding the shade= argument would probably simplify this.

Here is a slightly cleaned up version:

library(vcd)
data(Federalist)
Fed_fit <- goodfit(Federalist, type = "poisson")
res <- with(Fed_fit, (observed - fitted) / sqrt(fitted))

fill <- shading_hcl(observed = fed$observed, expected = fed$fitted, df = fed$df)(res)$fill
rootogram(fed, type = "deviation", rect_gp = gpar(fill = fill), lines_gp = gpar(col = "red", lwd=2), pop = FALSE)
shading <- shading_hcl(observed = fed$observed, expected = fed$fitted, df = fed$df)
legend_resbased(x = unit(0.8, "npc"), y = 0.4, height = 0.4)(res, shading = shading, "")
davidjohannesmeyer commented 9 years ago

OK, with the current version of vcd, try:

rootogram(fed, shade = TRUE) rootogram(fed, type = "deviation", shade = TRUE) rootogram(fed, type = "deviation", shade = TRUE, rect_gp = shading_Friendly)

(I yet have to document all those new arguments).

On 2015-04-03 17:32, Michael Friendly wrote:

This also is very nice -- In fact, I think I prefer it with a bipolar color scale. The legend is not too bad, but no one but you would know how to do that! I would like to use this in the book, because it enhances the use of rootograms, but I wouldn't want to have to explain this code. Adding the shade= argument would probably simplify this.

Here is a slightly cleaned up version:

|library(vcd) data(Federalist) Fed_fit <- goodfit(Federalist, type = "poisson") res <- with(Fed_fit, (observed - fitted) / sqrt(fitted))

fill <- shading_hcl(observed = fed$observed, expected = fed$fitted, df = fed$df)(res)$fill rootogram(fed, type = "deviation", rect_gp = gpar(fill = fill), lines_gp = gpar(col = "red", lwd=2), pop = FALSE) shading <- shading_hcl(observed = fed$observed, expected = fed$fitted, df = fed$df) legend_resbased(x = unit(0.8, "npc"), y = 0.4, height = 0.4)(res, shading = shading, "") |> rootogram(fed, shade = T) rootogram(fed, shade = T, rect_gp = shading_Friendly)

— Reply to this email directly or view it on GitHub https://github.com/friendly/VCDR/issues/33#issuecomment-89324956.

FH-Prof. Priv.-Doz. Dr. David Meyer Institut für Wirtschaftsinformatik

Fachhochschule Technikum Wien Höchstädtplatz 5, 1200 Wien T: +43 1 333 40 77-394 F: +43 1 333 40 77-99 394 E: david.meyer@technikum-wien.at I: www.technikum-wien.at

friendly commented 9 years ago

Beautiful!

friendly commented 9 years ago

One small suggestion: perhaps it would look better for many of these graphs to make the default line wider: lines_gp = gpar(col = "red", lwd=2)

I'll now replace Figs 3.15-3.16 with shaded versions. I'll look at some of the others in Ch 3 to see if this makes sense, e.g., 3.22

philchalmers commented 9 years ago

I really like those now, the shaded rootograms definitely show the difference in fit between the poison and negative binomial distributions with the Federalist data. Thanks for adding in that feature!

@friendly Figure 3.17 also might be interesting with the shading, since the graded pink -> red shift (or if possible, split the legend into 7 different shadings rather than 5) could show where the model is trending to fail the most. Perhaps it's not necessary, bc the residual trend is so obvious here, but might be worth inspecting to give even more of a 'pop' effect to the reader.

friendly commented 9 years ago

Revised Ch03, with the new vcd::rootogram and Phil's suggestions. See the latest chapter03.pdf.

sciencegraph commented 9 years ago

I find it somehow disquieting when y axis data exceed tick marks. It may give the sensation that data is out of range.

davidjohannesmeyer commented 9 years ago

rootogram is now fully documented in vcd rev. 804. I added the Federalist example to the help page.

(BTW: I bumped the version to 1.4-0)