DTUComputeStatisticsAndDataAnalysis / MBPLS

(Multiblock) Partial Least Squares Regression for Python
https://mbpls.readthedocs.io
BSD 3-Clause "New" or "Revised" License
29 stars 8 forks source link

Variable importance in projection #11

Open cwieder opened 1 year ago

cwieder commented 1 year ago

Is your feature request related to a problem? Please describe. Variable importance in projection (VIP) is a useful metric for PLS models to help understand feature importance. I use the mbpls package a lot in my research and it would be great for there to be a multiblock VIP attribute implemented.

Describe the solution you'd like Implementation of VIP as an attribute of the mbpls class, so that after model fitting VIP scores can be easily accessed. A definition of VIP can be found in Mehmood et al 2012 (https://doi.org/10.1016/j.chemolab.2012.07.010.) This definition is for standard (single-block) PLS rather than multi-block, however VIP should technically be extensible to MB-PLS by using the superscores.

Describe alternatives you've considered I have attached python code of the function for MB-PLS VIP which I have implemented myself. It uses attributes from the mbpls class (weights, scores etc) to calculate VIP. But it would be great if this could be implemented in the main package.


import numpy as np

def VIP_multiBlock(x_weights, x_superscores, x_loadings, y_loadings):
    # stack the weights from all blocks 
    weights = np.vstack(x_weights)
    # normalise the weights
    weights_norm = weights / np.sqrt(np.sum(weights**2, axis=0))
    # calculate product of sum of squares of superscores and y loadings
    sumsquares = np.sum(x_superscores**2, axis=0) * np.sum(y_loadings**2, axis=0)
    # p = number of variables - stack the loadings from all blocks
    p = np.vstack(x_loadings).shape[0]

    # VIP is a weighted sum of squares of PLS weights 
    vip_scores = np.sqrt(p * np.sum(sumsquares*(weights_norm**2), axis=1) / np.sum(sumsquares))
    return vip_scores```