duttashi / learnr

Exploratory, Inferential and Predictive data analysis. Feel free to show your :heart: by giving a star :star:
MIT License
78 stars 54 forks source link

Understanding Histograms and Density Plots #11

Closed duttashi closed 5 years ago

duttashi commented 6 years ago

A collection of self curated notes to understand data visualization techniques.

duttashi commented 6 years ago

Feature to Look For Some things to keep an eye out for when looking at data on a numeric variable:

Histograms

Histogram Basics

Histograms in R There are many ways to plot histograms in R:

Superimposing a Density A histogram can be used to compare the data distribution to a theoretical model, such as a normal distribution. This requires using a density scale for the vertical axis.

Density Plots

Density Plot Basics

Using base graphics, a density plot of the geyser duration variable with default bandwidth plot(density(geyser$duration))

Grouping and Faceting Both ggplot and lattice make it easy to show multiple densities for different subgroups in a single plot. lattice uses the group argument.

library(lattice)
densityplot(~ yield | site, data = barley)

In ggplot you can map the group variable to an aesthetic, such as color: ggplot(barley) + geom_density(aes(x = yield, color = site)) Using fill and alpha can also be useful ggplot(barley) + geom_density(aes(x = yield, fill = site), alpha = 0.2)

Often a more effective approach is to use the idea of small multiples, collections of charts designed to facilitate comparisons. For this we can use the lattice package. Lattice uses the term lattice plots or trellis plots. These plots are specified using the | operator in a formula. densityplot(~ yield | site, data = barley) Comparison is facilitated by using common axes. These ideas can be combined: densityplot(~ yield | site, group = year, data = barley, auto.key = TRUE)

Whereas, ggplot uses the notion of faceting ggplot(barley) + geom_density(aes(x = yield)) + facet_wrap(~site) Again this can be combined with the color aesthetic: ggplot(barley) + geom_density(aes(x = yield, color = year)) + facet_wrap(~site)

stale[bot] commented 5 years ago

This issue has been automatically marked as stale because it has not had recent activity. It will be closed if no further activity occurs. Thank you for your contributions.