The ikpls
software package provides fast and efficient tools for PLS (Partial Least Squares) modeling. This package is designed to help researchers and practitioners handle PLS modeling faster than previously possible - particularly on large datasets.
If you use the ikpls
software package for your work, please cite this Journal of Open Source Software article. If you use the fast cross-validation algorithm implemented in ikpls.fast_cross_validation.numpy_ikpls
, please also cite this arXiv preprint.
Dive into cutting-edge Python implementations of the IKPLS (Improved Kernel Partial Least Squares) Algorithms #1 and #2 [1] for CPUs, GPUs, and TPUs. IKPLS is both fast [2] and numerically stable [3] making it optimal for PLS modeling.
The documentation is available at https://ikpls.readthedocs.io/en/latest/; examples can be found at https://github.com/Sm00thix/IKPLS/tree/main/examples.
In addition to the standalone IKPLS implementations, this package
contains an implementation of IKPLS combined with the novel, fast cross-validation
by Engstrøm [7]. The fast cross-validation algorithm
benefit both IKPLS Algorithms and especially Algorithm #2. The fast
cross-validation algorithm is mathematically equivalent to the
classical cross-validation algorithm. Still, it is much quicker.
The fast cross-validation algorithm correctly handles (column-wise)
centering and scaling of the X and Y input matrices using training set means and
standard deviations to avoid data leakage from the validation set. This centering
and scaling can be enabled or disabled independently from eachother and for X and Y
by setting the parameters center_X
, center_Y
, scale_X
, and scale_Y
, respectively.
In addition to correctly handling (column-wise) centering and scaling,
the fast cross-validation algorithm correctly handles row-wise preprocessing
that operates independently on each sample such as (row-wise) centering and scaling
of the X and Y input matrices, convolution, or other preprocessing. Row-wise
preprocessing can safely be applied before passing the data to the fast
cross-validation algorithm.
The JAX implementations support running on both CPU, GPU, and TPU.
To enable NVIDIA GPU execution, install JAX and CUDA with:
pip3 install -U "jax[cuda12]"
To enable Google Cloud TPU execution, install JAX with:
pip3 install -U "jax[tpu]" -f https://storage.googleapis.com/jax-releases/libtpu_releases.html
These are typical installation instructions that will be what most users are looking for. For customized installations, follow the instructions from the JAX Installation Guide.
To ensure that JAX implementations use Float64, set the environment variable JAX_ENABLE_X64=True as per the Current Gotchas.
Install the package for Python3 using the following command:
pip3 install ikpls
Now you can import the NumPy and JAX implementations with:
from ikpls.numpy_ikpls import PLS as NpPLS
from ikpls.jax_ikpls_alg_1 import PLS as JAXPLS_Alg_1
from ikpls.jax_ikpls_alg_2 import PLS as JAXPLS_Alg_2
from ikpls.fast_cross_validation.numpy_ikpls import PLS as NpPLS_FastCV
import numpy as np from ikpls.numpy_ikpls import PLS N = 100 # Number of samples. K = 50 # Number of features. M = 10 # Number of targets. A = 20 # Number of latent variables (PLS components). # Using float64 is important for numerical stability. X = np.random.uniform(size=(N, K)).astype(np.float64) Y = np.random.uniform(size=(N, M)).astype(np.float64) # The other PLS algorithms and implementations have the same interface for fit() # and predict(). The fast cross-validation implementation with IKPLS has a # different interface. np_ikpls_alg_1 = PLS(algorithm=1) np_ikpls_alg_1.fit(X, Y, A) # Has shape (A, N, M) = (20, 100, 10). Contains a prediction for all possible # numbers of components up to and including A. y_pred = np_ikpls_alg_1.predict(X) # Has shape (N, M) = (100, 10). y_pred_20_components = np_ikpls_alg_1.predict(X, n_components=20) (y_pred_20_components == y_pred[19]).all() # True # The internal model parameters can be accessed as follows: # Regression coefficients tensor of shape (A, K, M) = (20, 50, 10). np_ikpls_alg_1.B # X weights matrix of shape (K, A) = (50, 20). np_ikpls_alg_1.W # X loadings matrix of shape (K, A) = (50, 20). np_ikpls_alg_1.P # Y loadings matrix of shape (M, A) = (10, 20). np_ikpls_alg_1.Q # X rotations matrix of shape (K, A) = (50, 20). np_ikpls_alg_1.R # X scores matrix of shape (N, A) = (100, 20). # This is only computed for IKPLS Algorithm #1. np_ikpls_alg_1.T
In examples, you will find:
To contribute, please read the Contribution Guidelines.