struggling to get basis penalty matrix

alejandro-ariza commented 4 years ago

Hi there,

It seems compute_penalty_matrix is the equivalent of fda::eval.penalty in R?

Similarly to the R package, I'm passing a basis type object (bspline) in compute_penalty_matrix, but there are 2 new arguments unknown for me: regularization_parameter and regularization, so I'm struggling to use it. I would really appreciate if someone could provide more details in the docstring to use this function or provide an usage example, please.

Apologies in advance if there is something obvious I'm missing!

Thanks.

vnmabus commented 4 years ago

Well, compute_penalty_matrix its more like an auxiliary function, that can combine penalty matrices for several covariates in one penalty matrix, useful for regression. I am not even sure if it should be private.

To obtain the penalty matrix for one basis, first you need to know which kind of penalization you want. For now, we have only Tikhonov Regularization and L2 Regularization (which is an special case of the former). The penalty_matrix method of the regularization object is the method that you have to call.

In Tikhonov regularization you have to choose a linear operator which will be penalized. The fda library only have linear differential operators, so I assume that is what you want.

As an example, suppose that you want to compute the penalty matrix corresponding to the second derivative operator of a Fourier basis. You can then do:

from skfda.misc.regularization import TikhonovRegularization
from skfda.misc.operators import LinearDifferentialOperator
from skfda.representation.basis import Fourier

basis = Fourier(n_basis=3)
regularization = TikhonovRegularization(LinearDifferentialOperator(2))
regularization.penalty_matrix(basis)

Sorry if all of this is a bit more complicated than in fda, but it was done to generalize to arbitrary linear operators, and to allow penalization for other objects, like numpy arrays and discretized functions.

alejandro-ariza commented 4 years ago

Thanks @vnmabus!

I was definitely looking at the wrong method, I guess I'm struggling with some terminology differences between fda in R and scikit-fda, so I got lost. I appreciate your intuition about what I wanted to achieve by looking at fda::eval.penalty. Sorry, that's was my best chance to explain it, since I'm not familiar with the maths behind functional data analysis.

So, I have compared my routine in R and the one I'm translating into Python, and I can confirm that these 2 operations return exactly the same penalty matrix:

# R
penalty = fda::eval.penalty(basis)

# Python
regularization = TikhonovRegularization(LinearDifferentialOperator(0))
penalty = regularization.penalty_matrix(basis)

The default linear differential operator in fda::eval.penalty is cero, this is why is not explicitly added.

Thanks again, this allows me to continue translating my routine from R to Python, I was stuck.

vnmabus commented 4 years ago

One last point, LinearDifferentialOperator(0) is the same as the identity operator. And the Tikhonov regularization with the identity operator is the same as the L2 regularization. Although probably you still want to use the linear differential operator for now, because it is faster right now (I have yet to optimize the identity operator).

alejandro-ariza commented 4 years ago

Good to know, thanks!

avivajpeyi commented 1 year ago

Hey @vnmabus , Im trying to code up something similar, but instead of fda::eval.penalty i'll be using fda::bsplinepen.

library(fda)
knots = seq(from = 0, to = 1, length = k - degree + 1);
nknots   = length(knots);
basisobj = fda::create.bspline.basis(
    c(0, knots[nknots-1]), norder = degree + 1,
    nbasis = k - 1, breaks = knots[-nknots]);
P = fda::bsplinepen(basisobj, Lfdobj = diffMatrixOrder)

TBH im not that familiar with either the R fda package, or this Python fda package.

Ill try the same suggestion you provided above, but was wondering if you have any suggestions before I proceed trying to code up this snippet in python using your TikhonovRegularization.penalty_matrix!

Thanks :)

vnmabus commented 1 year ago

Apparently fda::eval.penalty is just a wrapper that calls a more specific function (such as fda::bsplinepen) depending on the basis type. Thus, the suggested Python snippet should still work for you.

Note that in recent versions of the package the names Fourier and TikhonovRegularization are deprecated (but still work) and the names FourierBasis (or, in your case, BSplineBasis) and L2Regularization should be used instead in new code.

avivajpeyi commented 1 year ago

Thanks for the pointers! I think I'm close, but maybe I have misunderstood how to set up the L2Regularization object...

The penalty matrix I get using the R vs Python FDA packages for a set of equally spaced knots: penalty_matrix

I think the crux of the difference may lie in the LinearDifferentialOperator(2) vs Lfdobj = 2:

operator = skfda.misc.operators.LinearDifferentialOperator(2)
regularization = skfda.misc.regularization.L2Regularization(operator)
p = regularization.penalty_matrix(basis)

VS

P = fda::bsplinepen(basisobj, Lfdobj = 2)

Code snippet to reproduce plot

NOTE: this does require `rpy2` to be installed ```python import rpy2.robjects as robjects from rpy2.robjects.packages import importr import numpy as np import matplotlib.pyplot as plt import skfda from matplotlib.colors import TwoSlopeNorm def get_r_penalty_matrix(k, degree, diffMatrixOrder): """ library(fda) knots = seq(from = 0, to = 1, length = k - degree + 1); nknots = length(knots); basisobj = fda::create.bspline.basis( c(0, knots[nknots-1]), norder = degree + 1, nbasis = k - 1, breaks = knots[-nknots]); P = fda::bsplinepen(basisobj, Lfdobj = diffMatrixOrder) """ importr('fda') robjects.r(f'k <- {k}') robjects.r(f'degree <- {degree}') robjects.r(f'diffMatrixOrder <- {diffMatrixOrder}') robjects.r('knots <- seq(from = 0, to = 1, length = k - degree + 1)') robjects.r('nknots <- length(knots)') robjects.r( 'basisobj <- fda::create.bspline.basis(c(0, knots[nknots-1]), norder = degree + 1, nbasis = k - 1, breaks = knots[-nknots])') robjects.r("plot(basisobj, xlab='t', ylab='Bspline basis functions B(t)', las=1, lwd=2)") robjects.r("savePlot('r_basisobj.png', type='png')") robjects.r('P <- fda::bsplinepen(basisobj, Lfdobj = diffMatrixOrder)') p = robjects.r['P'] return p def get_py_penalty_matrix(k, degree, diffMatrixOrder): knots = np.linspace(0, 1, k - degree + 1) basis = skfda.representation.basis.BSplineBasis(knots=knots, order=degree) basis.plot() plt.savefig('python_basisobj.png') operator = skfda.misc.operators.LinearDifferentialOperator(diffMatrixOrder) regularization = skfda.misc.regularization.L2Regularization(operator) p = regularization.penalty_matrix(basis) return p if __name__ == '__main__': k = 30 degree = 3 diffMatrixOrder = 2 r_p = get_r_penalty_matrix(k, degree, diffMatrixOrder) py_p = get_py_penalty_matrix(k, degree, diffMatrixOrder) fig, ax = plt.subplots(1, 2) for i, (label, matrix) in enumerate(zip(('r', 'python'), (r_p, py_p))): matrix = matrix / np.max(matrix) ax[i].set_title(label) norm = TwoSlopeNorm(vmin=matrix.min(), vcenter=0, vmax=matrix.max()) im = ax[i].pcolor(matrix, ec='tab:gray', lw=0.005, cmap='bwr', norm=norm, antialiased=False) fig.colorbar(im, ax=ax[i], orientation='horizontal') plt.tight_layout() plt.savefig('penalty_matrix.png') ```

Do you have any suggestions on what I can change in my py code to make it match the r penalty matrix? Thanks again :D

vnmabus commented 1 year ago

I think the problem is in order=degree. It should be order=degree+1, just as in the R case. Changing that both plots are similar in my computer.

avivajpeyi commented 1 year ago

Thanks! For future reference, I had to make one additional change to the py code to get the same number of basis elements (num knots from k - degree+1 --> k - degree). Otherwise, the size of the penalty matrix won't match up.

    knots = np.linspace(0, 1, k - degree )
    basis = skfda.representation.basis.BSplineBasis(knots=knots, order=degree+1)

penalty_matrix

Updated code

```python import rpy2.robjects as robjects from rpy2.robjects.packages import importr import numpy as np import matplotlib.pyplot as plt import skfda from matplotlib.colors import TwoSlopeNorm def get_r_penalty_matrix(k, degree, diffMatrixOrder): """ library(fda) knots = seq(from = 0, to = 1, length = k - degree + 1); nknots = length(knots); basisobj = fda::create.bspline.basis( c(0, knots[nknots-1]), norder = degree + 1, nbasis = k - 1, breaks = knots[-nknots]); P = fda::bsplinepen(basisobj, Lfdobj = diffMatrixOrder) """ importr('fda') robjects.r(f'k <- {k}') robjects.r(f'degree <- {degree}') robjects.r(f'diffMatrixOrder <- {diffMatrixOrder}') robjects.r('knots <- seq(from = 0, to = 1, length = k - degree + 1)') robjects.r('nknots <- length(knots)') robjects.r( 'basisobj <- fda::create.bspline.basis(c(0, knots[nknots-1]), norder = degree + 1, nbasis = k - 1, breaks = knots[-nknots])') robjects.r("plot(basisobj, knots=FALSE, las=1, lwd=2, main=sprintf('Num basis = %d' ,basisobj$nbasis))") robjects.r("savePlot('r_basisobj.png', type='png')") robjects.r('P <- fda::bsplinepen(basisobj, Lfdobj = 1)') p = robjects.r['P'] return p def get_py_penalty_matrix(k, degree, diffMatrixOrder): knots = np.linspace(0, 1, k - degree ) basis = skfda.representation.basis.BSplineBasis(knots=knots, order=degree+1) basis.plot() plt.gcf().suptitle(f'Num basis = {len(basis)}') plt.savefig('python_basisobj.png') operator = skfda.misc.operators.LinearDifferentialOperator(diffMatrixOrder) regularization = skfda.misc.regularization.L2Regularization(operator) p = regularization.penalty_matrix(basis) return p if __name__ == '__main__': k = 10 degree = 3 diffMatrixOrder = 1 r_p = get_r_penalty_matrix(k, degree, diffMatrixOrder) py_p = get_py_penalty_matrix(k, degree, diffMatrixOrder) fig, ax = plt.subplots(1, 2) for i, (label, matrix) in enumerate(zip(('r', 'python'), (r_p, py_p))): matrix = matrix / np.max(matrix) ax[i].set_title(label) vals = sorted([matrix.min(), 0, matrix.max()]) # if vals are unique, then vcenter is 0 if len(set(vals)) == 3: norm = TwoSlopeNorm(vmin=vals[0], vcenter=vals[1], vmax=vals[2]) else: norm = TwoSlopeNorm(vmin=vals[0]-1, vcenter=0, vmax=vals[2]) im = ax[i].pcolor(matrix, ec='tab:gray', lw=0.005, cmap='bwr', norm=norm, antialiased=False) fig.colorbar(im, ax=ax[i], orientation='horizontal') plt.tight_layout() plt.savefig('penalty_matrix.png') ```

GAA-UAM / scikit-fda

struggling to get basis penalty matrix #210