DataSlingers / MoMA

MoMA: Modern Multivariate Analysis in R
https://DataSlingers.github.io/MoMA
GNU General Public License v2.0
22 stars 4 forks source link

Clean up existing entry points #34

Open Banana1530 opened 5 years ago

Banana1530 commented 5 years ago

Currently we have three R wrappers. They differ in functionalities, the abstraction level of arguments they take in, and where they are used in testsuites. Eventually their functionalities will be subsets of SFPCA wrappers' and thus they should be removed.

1. sfpca

https://github.com/michaelweylandt/MoMA/blob/7c8fd20fbd18d9cbfe21837bacd8ad401853efa6/R/sfpca.R#L1

It is simply an R interface for the C++ function cpp_sfpca , https://github.com/michaelweylandt/MoMA/blob/7c8fd20fbd18d9cbfe21837bacd8ad401853efa6/src/moma_R_function.cpp#L6 which uses repeatedly MoMA::solve and MoMA::deflate. We need to explicitly specify all parameters.

What it does: Solve the penalized SVD for fixed alpha_u/v, lambda_u/v. It also finds rank-k SVD by repeatedly deflating the matrix and then rerunning the algorithm. Note we don't have tests for the latter functionality yet.

Where it is used in the testsuite: It is used to test the correctness of the PG algorithm. To do this we inspect special cases where closed-form solutions exist. Then we check the results obtained by our algorithm against closed-form solutions. See https://github.com/michaelweylandt/MoMA/blob/7c8fd20fbd18d9cbfe21837bacd8ad401853efa6/tests/testthat/test_sfpca.R#L1.

2. moma_svd

https://github.com/michaelweylandt/MoMA/blob/7c8fd20fbd18d9cbfe21837bacd8ad401853efa6/R/moma_svd.R#L61

What it does: It supports the following three use cases. Note that it cooperates with prox argument wrappers like lasso(), scad() and PG loop settings wrapper (not merged yet). Essentially what it does is a proper subset of MoMA::select_nestedBIC described in section 3.

1) Find rank-k penalized SVD with fixed alpha_u/v and lambda_u/v by calling cpp_sfpca described above;

2) Run nested-BIC search on 2-D grids, whose axises could be a combination of any two parameters, by calling cpp_sfpca_nestedBIC. cpp_sfpca_nestedBIC does some sanity check and then calls MoMA::select_nestedBIC; https://github.com/michaelweylandt/MoMA/blob/7c8fd20fbd18d9cbfe21837bacd8ad401853efa6/src/moma_R_function.cpp#L179

3) Run grid search on 2-D grids by calling cpp_sfpca_grid, which uses MoMA::reset and MoMA::solve; https://github.com/michaelweylandt/MoMA/blob/7c8fd20fbd18d9cbfe21837bacd8ad401853efa6/src/moma_R_function.cpp#L80

Where it is used in the testsuite: It tests that prox arguments are correctly passed to C++ side (see test_argument.R https://github.com/michaelweylandt/MoMA/blob/7c8fd20fbd18d9cbfe21837bacd8ad401853efa6/tests/testthat/test_arguments.R#L1 ). We also test that cpp_sfpca_grid and cpp_sfpca give identical result (see test_grid.R https://github.com/michaelweylandt/MoMA/blob/7c8fd20fbd18d9cbfe21837bacd8ad401853efa6/tests/testthat/test_grid.R#L1).

3 MoMA::grid_BIC_mix

This will become the core of SFPCA wrappers (in progress). It supports finding the first k pairs of singular vectors, and the combination of nested-BIC search and grid search.

Where it is used in the testsuite: We test that it gives correctly sized lists. See https://github.com/michaelweylandt/MoMA/blob/7c8fd20fbd18d9cbfe21837bacd8ad401853efa6/tests/testthat/test_BIC_gird_mixed.R#L1

michaelweylandt commented 5 years ago

This is a great issue - thanks for opening!

Do you think it'd be possible to consolidate down to a single C++ entry point or is that too tricky to design?

As you mentioned in a recent email, it's tricky to have grid parameters and multi-rank solutions. (It's ok if it's a "grid" of a single parameter, but it's hard after that...) That might just be a thing we disallow until we can (someday) find a better fix.

Banana1530 commented 5 years ago

An update on current APIs' location.

Docs: https://docs.google.com/document/d/1TM-VW6nR_8CjlLjJ3SrFF6nJVXjeMFD2jDfKOktgPJM/edit?usp=sharing

Banana1530 commented 5 years ago
Argument specification Function Use case
Specify all arguments explicitly sfpca sfpca.R sfpca(X, alpha_u = 1, alpha_v = 2, Omega_u = sec_diff_mat(n), Omega_v = diag(p), EPS = 1e-9, MAX_ITER = 1e+5)
Parameter values and penalty types are separated; algorithm precision settings are wrapped in moma_pg_settings() moma_svd, SFPCA$initialize ... moma_expose.R SFPCA$initialize(X, u_sparsity = lasso(), lambda_u = c(1,10), selection_str = "ggbg" )
Parameter values and selection method are absorbed in penalty types moma_sfpca, moma_twpca ... moma_sf*_wrapper.R moma_spca(X, u_sparse = moma_lasso(lambda = seq(0, 2, 0.2), select_scheme = "b") )

The above table summarizes ways to specify penalty arguments.