dewittpe / cpr

Control Polygon Reduction: Methods for quick and efficient placement of internal knots for B-splines and tensor products of B-splines.
http://www.peteredewitt.com/cpr/
2 stars 0 forks source link

Check for rank deficient design matrices! #24

Closed dewittpe closed 7 years ago

dewittpe commented 7 years ago

In cp and cn check that the design matrices are full rank. Using lme4::lmer gives warnings, stats::lm just returns a coef vector with NAs.

dewittpe commented 7 years ago

Benchmark the rank method in the Matrix package against the rank function in armadillo.

dewittpe commented 7 years ago

My initial idea was to look for the rank of the design matrix before trying to fit a regression model. However, that could result in a notable increase in computation time and/or a need to make the check optional.

Instead, I've compared the number of columns of the model.matrix(fit) and the number of coefficients(fit) or any(is.na(coefficients(fit))) to determine if the model matrix was rank deficient.

A few testing examples:

library(lme4)
devtools::load_all()

# Test that warnings and/or errors will be given when a design matrix is rank
# deficient when using lme4::lmer for the regression method
init <- cp(log10(pdg) ~ bsplines(day, df = 594) + ethnicity + age + (1|id),
           data = spdg,
           method = lmer)

# same test for regression method stats::lm
init <- cp(log10(pdg) ~ bsplines(day, df = 594) + ethnicity + age, data = spdg)

# a low degree of freedom rank deficient design matrix:
init <- cp(log10(pdg) ~ bsplines(day, iknots = c(0, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005)) + ethnicity + age, data = spdg)

# testing for warnings and errors for cn
k <- list(c(0, 0.0001, 0.0002, 0.0003, 0.0004, 0.0005),
          trimmed_quantile(spdg$age, prod = 1:10/11),
          trimmed_quantile(spdg$ttm, prod = 1:10/11))
init <-
  cn(log10(pdg) ~ btensor(list(day, age, ttm), iknots = k) + ethnicity, 
     data = spdg)

init <-
  cn(log10(pdg) ~ btensor(list(day, age, ttm), iknots = k) + ethnicity + (1|id), 
     data = spdg,
     method = lmer)

# The following cn should fit without error or warning
init <-
  cn(log10(pdg) ~ btensor(list(day, age, ttm), df = list(10, 10, 10)) + ethnicity, 
     data = spdg)
init <-
  cn(log10(pdg) ~ btensor(list(day, age, ttm), df = list(10, 10, 10)) + ethnicity + (1|id), 
     data = spdg, 
     method = lmer)
dewittpe commented 7 years ago

This needs more work.

Check for rank when calling the generate_cp_formula_data

added option to control when this is done.

dewittpe commented 7 years ago

The check for rank is done within the cp.formula and cn.formula calls. The effects of resolving #25 may require the check being in cp.formula and cn.formula anyways.