KindXiaoming / pykan

Kolmogorov Arnold Networks
MIT License
14.43k stars 1.32k forks source link

How to train with large datasets? #122

Open manglav opened 4 months ago

manglav commented 4 months ago

I am trying to use a massive CSV (1M+ rows), with the X input having 400 dimensions. It's technically working but I had two main questions:

  1. It seems like the GPU is bottlenecked a bit by the CPU - I'm only seeing 35% GPU usage vs 100% single core CPU. If I run in CPU mode, I see around 22% CPU (across 64 cores). Any suggestions on improving this?

  2. The data is so large that we are reading the csv in chunks of 1000 and calling train. Is this the right approach? What is the best way to train with this big data? What about the <bias_trainable, sp_trainable, and sb_trainable> parameters?

model = KAN(width=[400,64, 32, 16,1], grid=10, k=4, seed=0, device=device, bias_trainable=False, sp_trainable=False, sb_trainable=False)
for chunk in chunks:
...< data processing >....
data_dict = create_data_from_chunk(chunk)
model.train(data_dict, opt="LBFGS", steps=1, device=device, update_grid=False)
  1. I can save a checkpoint, but I don't know how to use the trained model either. Is there a "predict" function to use?
JinglaiZheng commented 4 months ago

I had the same problem, I couldn't find the API for prediction after training the model.

Zhangyuyi-0825 commented 4 months ago

I had the same problem, I couldn't find the API for prediction after training the model.

My research is in the area of time series forecasting (tabular data), where the training result of the model is a mathematical expression, and therefore its prediction results are obtained by bringing the data from the test set into the expression. However, the forecasting performance is very, very, very unsatisfactory and I am still trying to adjust the network structure and parameters.

JinglaiZheng commented 4 months ago

I see. In fact, KAN has the advantage of giving explicit function expressions directly between input and output. My experience has taught me that the required depth of the network is important, and some operations require a sufficient depth to be implemented, otherwise we will only get an approximate function expression with a certain amount of error. image

ChrisD-7 commented 3 months ago

building on the GPU training I tried running it on my CPU and ran out of compute when running it with GPU on colab for the device.ipynb code got this error

image

seyidcemkarakas commented 1 month ago

I had the same problem, I couldn't find the API for prediction after training the model.

Did you handle this situation?