corybrunson / ordr

manage ordinations and render biplots in a tidyverse workflow
https://corybrunson.github.io/ordr/
GNU General Public License v3.0
20 stars 5 forks source link

Reproducing defaults from original ggbiplot extension #53

Closed sraul1 closed 1 year ago

sraul1 commented 1 year ago

Hey All,

I have been trying to re-create an old figure that used the original ggbiplot extension to no avail.

The original figure had default scaling parameters (scale = 1; obs.scale = 1 - scale; var.scale = scale) but I have not been able to translate that to how row/columns are scaled in ordr. It seemed like it would be an easy solution, I'm just not as well-versed in how ordr structures things.

Just thought I would query you all here before making a reprex.

corybrunson commented 1 year ago

Hi @sraul1, thanks for raising the issue. You're right that exactly reproducing a {ggbiplot} plot can be difficult or impossible in {ordr}; i've illustrated using the example from the README below.

Part of the reason is that {ggbiplot} does some things under the hood that i don't understand or think should be controlled by the user. Specifically, the circle radius and the variable coordinates are scaled independently of each other, which prevents their useful co-interpretation as projected axes whose lengths and angles approximate fidelity in the plot and correlations, respectively. As a result, most of the arrows almost reach the circumference, which suggests that the vast majority of the variance in the data is captured in these two dimensions. (This is belied by the percentages in the axis labels.) Moreover, the scaling of the variable coordinates is not combined with secondary axes, and as a result their inner product relationship with the observation coordinates is partially lost.

These choices may be well-justified and even common practice among statisticians in this field; i wouldn't necessarily know. The transformations i built in to {ordr} are ones that i knew could apply meaningfully to any SVD-based method. I'd be very glad for a resource on how to best use the chi-squared scaling techniques of {ggbiplot} with other transformations, for example, but i haven't found one yet. In the meantime, i'd be glad to see what plot you're trying to reproduce and offer any suggestions.

library(ggbiplot)
#> Loading required package: ggplot2
#> Warning: package 'ggplot2' was built under R version 4.1.2
#> Loading required package: plyr
#> Warning: package 'plyr' was built under R version 4.1.2
#> Loading required package: scales
#> Warning: package 'scales' was built under R version 4.1.2
#> Loading required package: grid
data(wine)
wine.pca <- prcomp(wine, scale. = TRUE)
ggbiplot(wine.pca, obs.scale = 1, var.scale = 1,
         groups = wine.class, ellipse = TRUE, circle = TRUE) +
  scale_color_discrete(name = '') +
  theme(legend.direction = 'horizontal', legend.position = 'top')


library(ordr)
#> 
#> Attaching package: 'ordr'
#> The following object is masked from 'package:ggbiplot':
#> 
#>     ggbiplot
wine.pca %>%
  as_tbl_ord() %>%
  augment_ord() %>%
  mutate_rows(class = wine.class) %>%
  ggbiplot(aes(color = class, label = name),
           # note arbitrary choice of scale factor for column elements
           sec.axes = "cols", scale.factor = 5) +
  geom_unit_circle(color = muted("white"), alpha = 1/3, scale.factor = 5) +
  geom_cols_vector(color = "darkred") +
  geom_cols_text_radiate(size = 3, color = "darkred") +
  geom_rows_point() +
  # note that the confidence level is used differently
  stat_rows_ellipse(level = .68) +
  scale_color_discrete(name = "") +
  theme(legend.direction = "horizontal", legend.position = "top")

Created on 2023-03-14 with reprex v2.0.2

sraul1 commented 1 year ago

Sorry for closing this - not sure what exactly I clicked.

This validates much of what I was experiencing, especially with regard to the scaling. My main issue was that I had been unable to create the same scale using any package besides that original {ggbiplot} function and was unable to discern why that was. The overall plot looks much the same, albeit with some outliers that show up when I use {ordr} compared to the original.

The re-creation isn't an issue, I'm fine with the plots using ordr, just wanted to find some kind of an explanation for the difference. My background is as a hydrologist/environmental scientist, not a statistician, so this is not necessarily my expertise.

I'll put together a reprex on in shortly, need to clean some stuff up first.

corybrunson commented 1 year ago

OK! I'll reopen it now and watch out for the reprex.

corybrunson commented 1 year ago

Hi @sraul1, i'm closing this issue to help focus on open needs, but please do reopen it if and when you have some example code or have any follow-up issues!