Open corybrunson opened 5 years ago
First of all, thank you for this fantastic package (as well as for ggalluvial!). A dropped perpendicular feature would be a great addition: combined with the calibrated axes the perpendiculars really help explain the meaning of a biplot. I've been creating them with calibrate package in base R and have taken a stab at using calibrate calculations in ggplot2 but of course it would be nice to have a smoother interface.
library(calibrate)
#> Loading required package: MASS
df <- data.frame(x = c(5, 8, 9), y = c(4, 13, 6))
pca <- prcomp(df)
scores <- pca$x
loadings <- pca$rotation
axisrange <- max(df[,"x"]) - min(df[,"x"])
ticklab <- round(seq(min(df[,"x"])-.5*axisrange,
max(df[,"x"])+.5*axisrange,
by = axisrange/2))
plot(scores, pch = 16, asp = 1, ylim = c(-5, 5))
text(x = scores[,1]-.5, y = scores[,2], labels = 1:nrow(scores))
c <- calibrate(g = loadings[,"PC1"],
y = df[,"x"] - mean(df[,"x"]),
tm = ticklab - mean(df[,"x"]),
Fr = scores,
tmlab = ticklab,
tl = .2,
lm = TRUE,
axislab="x",
where = 1,
labpos = 4,
dp = TRUE)
#> ---------- Calibration Results for x ---------------------
#> Length of 1 unit of the original variable = 1
#> Angle = 76.3 degrees
#> Optimal calibration factor = 1
#> Used calibration factor = 1
#> Goodness-of-fit = 1
#> Goodness-of-scale = 1
#> ------------------------------------------------------------
arrows(0, 0, loadings[,1], loadings[,2], length = .1, col = "red",
lwd = 1.5)
Created on 2021-08-25 by the reprex package (v2.0.1)
library(calibrate)
#> Loading required package: MASS
library(ggplot2)
df <- data.frame(x = c(5, 8, 9), y = c(4, 13, 6))
pca <- prcomp(df)
scores <- pca$x
loadings <- pca$rotation
axisrange <- max(df[,"x"]) - min(df[,"x"])
ticklab <- round(seq(min(df[,"x"])-.5*axisrange,
max(df[,"x"])+.5*axisrange,
by = axisrange/2))
c <- calibrate(g = loadings[,"PC1"],
y = df[,"x"] - mean(df[,"x"]),
tm = ticklab - mean(df[,"x"]),
Fr = scores,
tmlab = ticklab,
tl = .3,
graphics = FALSE)
#> ---------- Calibration Results for ----------------------
#> Length of 1 unit of the original variable = 1
#> Angle = 76.3 degrees
#> Optimal calibration factor = 1
#> Used calibration factor = 1
#> Goodness-of-fit = 1
#> Goodness-of-scale = 1
#> ------------------------------------------------------------
dfpoints <- data.frame(scores)
dfpoints$xsdrop <- c$Fpr[,1]
dfpoints$ysdrop <- c$Fpr[,2]
dfaxis <- data.frame(x = c$M[1, 1], y = c$M[1, 2],
xend = c$M[nrow(c$M),1], yend = c$M[nrow(c$M), 2])
dfticks <- data.frame(c$M, c$Mn, ticklab)
colnames(dfticks) <- c("x", "y", "xend", "yend", "label")
dfarrows <- data.frame(xend = loadings[,1], yend = loadings[,2])
ggplot(dfpoints, aes(x = PC1, y = PC2)) +
geom_point() +
geom_segment(data = dfticks, aes(x = x, y = y, xend = xend, yend = yend), col = "blue") +
geom_text(data = dfticks, aes(x = xend, y = yend, label = label), nudge_x = .3) +
geom_segment(data = dfaxis, aes(x = x, y = y, xend = xend, yend = yend), col = "blue") +
geom_segment(data = dfpoints, aes(x = PC1, y = PC2, xend = xsdrop, yend = ysdrop),
lty = "dashed", col = "cornflowerblue") +
geom_segment(data = dfarrows, aes(x = 0, y = 0, xend = xend, yend = yend),
arrow = arrow(length = unit(.03, "npc")), color = "red", lwd = 1.5) + coord_equal()
Created on 2021-08-25 by the reprex package (v2.0.1)
The closest I can get with ordr:
library(ggplot2)
library(ggbiplot)
#> Loading required package: plyr
#> Loading required package: scales
#> Loading required package: grid
library(ordr)
#>
#> Attaching package: 'ordr'
#> The following object is masked from 'package:ggbiplot':
#>
#> ggbiplot
df <- data.frame(x = c(5, 8, 9), y = c(4, 13, 6))
pca_ordr <- ordinate(df, cols = 1:2, model = ~ prcomp(., scale. = TRUE))
ggbiplot(pca_ordr) +
xlim(c(-2, 2)) +
ylim(c(-2, 2)) +
geom_rows_point() +
geom_rows_text(aes(label = 1:5), nudge_x = .2) +
geom_cols_vector(color = "red", lwd = 1.5) +
geom_cols_axis() +
geom_cols_axis_ticks(aes(center = .center, scale = .scale)) +
geom_cols_axis_text(aes(center = .center, scale = .scale)) +
geom_cols_axis_label(aes(label = .name))
#> Warning: Ignoring unknown aesthetics: center, scale
#> Warning: Ignoring unknown aesthetics: center, scale
#> Warning: Ignoring unknown aesthetics: label
Created on 2021-08-25 by the reprex package (v2.0.1)
I should also note that it would be helpful to be able to plot one axis at a time since the graph gets messy with multiple axes, though that might be challenge in the ggplot2 framework.
@jtr13 thank you for these details!
I'm going to reassess what features should go into a first release, and you've got me thinking that the projection geom is one that should be put back on that list. (Meanwhile, the predict = TRUE
option for ggbiplot()
is perhaps too ambitious, and i'm planning to remove it for the first release.) I'll come back to work on it ASAP.
Ditto the ability to specify what row or column elements should be plotted by each layer, as you mention at the end. An experimental solution is the subset
parameter, which is implemented in GeomIsolines$setup_data
and recycled by a few other layers, not including GeomAxis
. I'll add an issue to expand this to all plot layers.
Correction and clarification: Thanks to Gower &al's books, i should have remembered that projection onto an axis is a prediction step that calls for a prediction biplot, whereas vector addition along the axes is an interpolation step that makes use of the (currently implemented) interpolation biplot. (Interpolation axis unit vectors are located analogously to case markers, whereas prediction axes must be rescaled in the linear setting and completely redrawn in the non-linear setting.) This issue should really be about rendering both sets of steps, each being appropriate for one type of biplot. Vector addition should come first, for the first CRAN release.
In the conventional use of PCA with rows of cases and columns of variables, the vector sum geom could look like geom_cols_addition(rows = new_data)
. The layer would inherit the column (standard) coordinates from ggbiplot()
, and the rows
(or perhaps new_data
) parameter would accept rows of data with the same variables as the original table fed to, e.g., princomp()
, or a list of variable values, e.g. list(mpg = 28, cyl = 4L, ...)
. If some values are missing or NA
but the original PCA is centered, then the interpolated points would correspond to mean imputations.
It shouldn't be much extra work to include an option type = c("centroid", "sequence")
to control whether the vectors along the axes are added in sequence or their centroid is scaled by their number.
Experimental work is ongoing in the vecsum
branch.
Update: Based on further reading, and on the counterintuitiveness and complexity of the experimental implementation, i think both of these annotations should work as follows:
Stat*$compute_*()
) for predictive layers, interpolation vector sums, prediction projections, and other geometric elements derived from matrix algebra..O()
?) for the underlying ordination model, analogous to .G()
in tidygraph, by which to access these methods from within a ggproto build.StatRows*
and StatCols*
that use this pronoun to calculate layer data.The reason for a change in approach is that the 'tbl_ord' class is designed for display (print, summary, annotation, plot), and ggbiplot()
itself can only see the fortified tibble, whereas the underlying ordination classes, which are designed for modeling (interpolate, predict), are the proper setting for these tasks. For example, predictive MDS biplots use nonlinear axes while predictive LDA biplots use Voronoi cells, and the S3 methods can be tailored to these needs.
I think this deserves to wait until feedback is collected from a first CRAN release.
A compensatory benefit will be the new generics and methods themselves, which might be used in base R biplots or for other purposes. To my knowledge, there are no standalone implementations of them; they are only built inaccessibly into visualization tools like those of Gower &al.
A geom layer
geom_u_projection(from = i, to = j)
should render a (by default dashed) line from the ordinates of thei
th row of $U$ to the linear subspace containing those of thej
th row of $V$, maybe with an optional cute right angle symbol. Bothfrom
andto
could contain multiple indices. The projection should adopt the axes (primary or secondary) used by $U$.