Hi! I made a KAN-based diffusion model, with quite good results

KindXiaoming / pykan

Kolmogorov Arnold Networks

MIT License

13.59k stars 1.19k forks source link

Hi! I made a KAN-based diffusion model, with quite good results #160

Open kabachuha opened 1 month ago

kabachuha commented 1 month ago

Hi! I made a test of how KAN could assist denoising diffusion and adapted the very initial spiral-diffusion model utilizing an MLP to KAN.

You can see that a two layer KAN fares almost as good as a 4-layer MLP (despite having 30% less parameters), and the 4-layer KAN vastly outperforming it.

loss-comparison

I think it would be worth to add a notebook with diffusion to this repository, at least for educational purposes, and for gaining more mainstream attention.

Additionally, it could be nice to explore if it learned some good functions in the layers.

https://github.com/kabachuha/kan-diffusion

Same structure KAN:

MLP:

mlp-s2

2-layer KAN:

smol-kan-s2

KindXiaoming commented 1 month ago

Thanks for implementing this! This is one experiment in my mind but never got a chance to do. The score field is kind of multiscale, which might be why KANs can outperform MLP. Would also be fun to look at approximation errors for KANs and MLPs in near/immediate/far fields? Do KANs outperform MLPs mostly in nearly fields (If what I said above makes sense)?

kabachuha commented 1 month ago

I'm not so well-versed in the math base myself, is it something like looking at losses for each timesteps?

Anyway, I think it would be a great community experiment

1ssb commented 1 month ago

This is not strictly generalised. I tried it with an image diffusion model replacing the MLPs with the equivalently layered KANs, the generalisation is poor. I have elaborated on a more pathological example in the functional approximation. MLPs are still outperforming KANs. Any insights?

AlessandroFlati commented 1 month ago

MLPs are still outperforming KANs.

Wow, a single example proves that an entire branch of techniques and methods are useless.

1ssb commented 1 month ago

No, I think it just proves we still do not have enough insights to define optimality in terms of structure for the target, which is hopefully what the goal is, correct me if I am wrong.

KindXiaoming commented 1 month ago

I think generative modeling is particuarly subtle because these two goals are not necessarily aligned: (1) fit score function well; (2) being able to generalize. My intuition is that KANs are good at (1) but not necessarily (2) (which is the true goal of generative modeling). This experiment seems like a good starting point, definitely agree this is a community effort. Again, great initiative!

ChrisD-7 commented 1 month ago

I'm not so well-versed in the math base myself, is it something like looking at losses for each timestep?

@kabachuha were u able to find this?