Open corybrunson opened 2 years ago
I realize now that the arguments xnames
and znames
partially resolve this issue. I think it would be appropriate for them to default to colnames(x)
and colnames(z)
, respectively, and this would be part of the proposed PR. I apologize for overlooking that!
I pushed a commit with the defaults for xnames
and znames
.
@bnaras very cool, thank you!
It looks like the names may not be preserved through the process. If x
and z
are matrices, then names()
doesn't get their column names; and, when they are data frames, the scale()
calls (inside CCA()
) convert them to matrices before names()
are obtained. These problems should be solved by replacing names()
with colnames()
, which works both on data frames and on matrices.
library(PMA)
sessioninfo::session_info(pkgs = "PMA")
#> ─ Session info ───────────────────────────────────────────────────────────────
#> setting value
#> version R version 3.6.0 (2019-04-26)
#> os macOS 10.15.7
#> system x86_64, darwin15.6.0
#> ui X11
#> language (EN)
#> collate en_US.UTF-8
#> ctype en_US.UTF-8
#> tz America/New_York
#> date 2022-02-06
#> pandoc 2.16.2 @ /usr/local/bin/ (via rmarkdown)
#>
#> ─ Packages ───────────────────────────────────────────────────────────────────
#> package * version date (UTC) lib source
#> PMA * 1.2-2 2022-02-06 [1] Github (bnaras/PMA@8e3fd29)
#>
#> [1] /Users/jason.brunson/Library/R/3.6/library
#> [2] /Library/Frameworks/R.framework/Versions/3.6/Resources/library
#>
#> ──────────────────────────────────────────────────────────────────────────────
# names of data frame inputs
names(LifeCycleSavings)
#> [1] "sr" "pop15" "pop75" "dpi" "ddpi"
# CCA of life cycle savings data
savings_cca <- CCA(
as.matrix(LifeCycleSavings[, c(2L, 3L)]),
as.matrix(LifeCycleSavings[, c(1L, 4L, 5L)]),
K = 2L, penaltyx = .7, penaltyz = .7
)
#> 12
#> 12
# missing names
savings_cca$u
#> [,1] [,2]
#> [1,] -1 1
#> [2,] 0 0
savings_cca$xnames
#> NULL
Created on 2022-02-06 by the reprex package (v2.0.1)
Gah, that was my bad. Pushed a commit.
The matrices
u
andv
and the vectorsd
andcors
in the output ofPMA::CCA()
, for example, are unnamed. Maybe this is intentional for compatibility with certain routines. But it would be helpful for other purposes to have row and column names from the input data matricesx
andz
incorporated into the output, and for the dimension to be canonically named. Preserving names from the input data in the output would, for example, make it easier to read the output and to create other named objects from it, as below.I would suggest the following for the
CCA()
output, for example:x
(respectively,z
) to the row names ofu
(v
)sCD1
throughsCD<K>
(for "sparse canonical dimension") to the column names of bothu
andv
sCD*
names tod
andcors
If this is of interest, then i would be glad to submit a PR with suggested assignments for the outputs of
SPC()
,CCA()
, andMultiCCA()
. Thank you!Created on 2022-02-04 by the reprex package (v2.0.1)