ameli / imate

High-Performance Python Package for Scalable Randomized Algorithms in Linear Algebra
https://ameli.github.io/imate
BSD 3-Clause "New" or "Revised" License
3 stars 3 forks source link

Understanding the Schatten norm #1

Open peekxc opened 1 year ago

peekxc commented 1 year ago

Hey! Out of curiosity, what is the limitation that prevents imate from working with products of rectangular matrices?

For any real-valued matrix $A \in \mathbb{R}^{n \times m}$ and positive semi-definite matrix $P \in \mathbb{R}^{n \times n}$ given by $P = A A^T$ (or $P = A^T A)$, I would like to estimate various Schatten norms of both $A$ and $P$, e.g.

from math import prod
import numpy as np 
from scipy.sparse import  random
from scipy.sparse.linalg import svds
from imate import schatten, trace

## Generate random semi-sparse matrix 
n = 500
A = random(n, 40, density=0.050)
P = A @ A.T

## Ground truth
sv = np.sort(np.abs(svds(A, k = min(A.shape)-1)[1]))
ev = np.sort(np.linalg.eigh(P.todense())[0])

print(f"Density of P: {P.nnz/prod(P.shape)}")
print(f"(A) Schatten-1 / nuclear norm: {np.sum(sv[sv > 1e-16])}")
print(f"(A) Schatten-2 / frobenius norm: {np.sum(sv[sv > 1e-16]**2)**(1/2)}")
print(f"(P) Schatten-1 / nuclear norm: {np.sum(ev[ev > 1e-16])}")
print(f"(P) Schatten-2 / frobenius norm: {np.sum(ev[ev > 1e-16])**(1/2)}")
# Density of P: 0.096224
# (A) Schatten-1 / nuclear norm: 110.72781488837865
# (A) Schatten-2 / frobenius norm: 18.074377305479523
# (P) Schatten-1 / nuclear norm: 330.2851529166273
# (P) Schatten-2 / frobenius norm: 56.81765769772352

I can get the Schatten-norms out of $P$ with imate:

schatten(P, p=1.0, method="exact")*P.shape[0]
schatten(P, p=2.0, method="exact")*P.shape[0]**(1/2)
# 330.28515291662717
# 56.81765769772352s

But I also want the norms for $A$. Initially I thought to use that fact that the singular values of $A$ are the square roots of the eigenvalues of $P$. This works well enough using numpy:

print(f"(A) Schatten-1 / nuclear norm, derived from P: {np.sum(np.sqrt(ev[ev > 1e-16]))}")
print(f"(A) Schatten-2 / frobenius norm, derived from P: {np.sum(ev[ev > 1e-16])**(1/2)}")
# (A) Schatten-1 / nuclear norm, derived from P: 112.62572173011456
# (A) Schatten-2 / frobenius norm, derived from P: 18.173749005547183

But when I try this for A via the gramian approach I get an error:

schatten(A, gram=True, p=1.0, method="slq")
# ValueError: Input matrix should be a square matrix.

EDIT: Actually the Schatten-p norm of a general matrix expressed as a trace quantity is given by 3rd equation in section 2.3 of Ubaru's paper:

$$ \lVert X \rVertp = \left(\sum\limits{i=1}^r \sigmai^p \right)^{1/p} = \left(\sum\limits{i=1}^r \lambda_i^{p/2} \right)^{1/p}$$

This doesn't seem capturable by imate's definition of the Schatten norm. Am I missing something? Or is this particular matrix function just yet to be implemented in imate?

ameli commented 12 months ago

Thank you for getting in touch and bringing this issue to my attention. I have resolved the ValueError: Input matrix should be a square matrix error associated with the square matrix requirement when the Gram matrix is enabled.

Regarding your inquiry concerning the interpretation of the Schatten norm, both the mentioned paper and the package are aligned in their definitions for non-negative definite matrices. However, there is one distinction: within the package, we incorporate a coefficient of $n^{-\frac{1}{p}}$. To adhere to the mentioned paper's definition precisely, you can simply divide the result by this coefficient. For a more comprehensive understanding, you may refer to equations 3 to 5 in this article or consult this section within the package documentation.

Regarding the SLQ method, it is designed with the prerequisite that the input matrix is symmetric. This requirement aligns well with many practical applications, particularly in machine learning, where calculations involving trace functions are often required. These matrices in ML applications, such as covariance matrices, are typically non-negative definite.

However, if your interest extends to working with symmetric matrices that may not strictly adhere to the non-negative definite condition, please feel free to inform me. Adapting the implementation to handle such symmetric but indefinite matrices is a straightforward process and involves introducing an absolute value operation to the eigenvalues, effectively replacing $A$ with $\vert A \vert = \sqrt{A^{\ast} A}$.

Please don't hesitate to share further details or pose additional questions if necessary.