chufanchen / read-paper-and-code

0 stars 0 forks source link

KAN: Kolmogorov-Arnold Networks #147

Closed chufanchen closed 5 months ago

chufanchen commented 5 months ago

https://arxiv.org/abs/2404.19756

https://github.com/KindXiaoming/pykan

chufanchen commented 5 months ago

Bézier curve

A Bézier curves are smooth curves defined by a set of control points $\mathbf{P}_i$.

The Bézier curve of degree $n$ is defined as

\mathbf{C}(t)=\sum_{i=0}^{n}b_{i,n}(t)\mathbf{P}_i, 0 \leq t \leq 1

where the polynomials

b_{i, n}(t)={n \choose i} t^i(1-t)^{n-i}, \quad i=0, \ldots, n

are known as Bernstein basis polynomials of degree $n$.

B-splines

B-splines(short for Basis splines) use several Bézier curves joined end on end.

A $k$ degree B-spline curve defined by $n+1$ control points will consist of $n-k+1$ Bézier curves.

S(t)=\sum_{i=0}^{n} N_{i,k}(t) P_i

where $\left(P_0, P_1, \ldots, Pn\right)$ are control points and $N{i, k}(t)$ are the basis functions defines using the Cox-de Boor recursion formula

N_{i, 0}(t)=\Big\{
\begin{array}{ll}
1 \text{ if } t_i \leq t < t_{i+1}  \\
0 \text{ otherwise }
\end{array}
N_{i, j}(t)=\frac{t-t_i}{t_{i+j}-t_i} N_{i, j-1}(t)+\frac{t_{i+j+1}-t}{t_{i+j+1}-t_{i+1}} N_{i+1, j-1}(t) .

B-splines of order $k+1$ are connected piece-wise polynomial functions of degree $k$ defined over a grid of knots ${t{0},\dots ,t{i},\dots ,t_{n}}$.

Implementation

If data(e.g. $[-10,10]$) is not match grid($[-1,1]$), use update_grid_from_sample to adjust grids to samples.

The grid can be uniform or non-uniform.

https://github.com/KindXiaoming/pykan/blob/master/kan/spline.py

chufanchen commented 5 months ago

Kolmogorov–Arnold Representation

For every $n \in \mathbb{N}_{\ge 2}$, there exist $\phi_{i,j}\in C([0,1])$ such that any $f \in C([0,1]^n)$ can be represented as

f(x_1,\dots, x_n)=\sum_{q=1}^{2n+1} \Phi_q\left(\sum_{p=1}^n \phi_{p,q}(x_p)\right)

Kolmogorov-Arnold Networks

\mathbf{x}_{l+1}=\Phi_l \mathbf{x}_l 

where $\Phil$ is a matrix of function with shape $n{l+1}\times n_l$

\phi_{ij}(x)=w(SiLU(x)+S_n(x))

A general KAN network is a composition of $L$ layers:

KAN(\mathbf{x})=(\Phi_{L-1}\circ\Phi_{L-2}\cdots\circ\Phi_{0})\mathbf{x}
chufanchen commented 5 months ago

KAN Application