KindXiaoming / pykan

Kolmogorov Arnold Networks
MIT License
14.67k stars 1.35k forks source link

out of memory if I use kanlayer replacing mlp in nano-gpt #39

Closed AlexWang1900 closed 2 months ago

AlexWang1900 commented 4 months ago

I replaced a mlp of kanlayer like this: `class MLP_KAN(nn.Module):

def __init__(self,config):
    super().__init__()
    self.in_dim=config.n_embd
    self.first_dim=config.n_embd //4
    self.out_dim=config.n_embd //4
    self.kan_1 = KANLayer(self.in_dim,self.first_dim,device="cuda")
    self.kan_2 = KANLayer(self.first_dim,self.out_dim,device="cuda")

def forward(self,x):
    n,l,c = x.shape
    x = x.reshape([n*l,c])
    x = self.kan_1(x)
    x = self.kan_2(x)
    x = x.view([n,l,c])
    return x`

using above layer instead of original MLP `class MLP(nn.Module):

def __init__(self, config):
    super().__init__()
    self.c_fc    = nn.Linear(config.n_embd, 4 * config.n_embd, bias=config.bias)
    self.gelu    = nn.GELU()
    self.c_proj  = nn.Linear(4 * config.n_embd, config.n_embd, bias=config.bias)
    self.dropout = nn.Dropout(config.dropout)

def forward(self, x):
    x = self.c_fc(x)
    x = self.gelu(x)
    x = self.c_proj(x)
    x = self.dropout(x)
    return x`

and it run out of memory : File "/home/alex/Projects/pykan-master/kan/spline.py", line 60, in B_batch value = (x - grid[:, :-(k + 1)]) / (grid[:, k:-1] - grid[:, :-(k + 1)]) B_km1[:, :-1] + (grid[:, k + 1:] - x) / (grid[:, k + 1:] - grid[:, 1:(-k)]) B_km1[:, 1:] torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 22.50 GiB. GPU 0 has a total capacty of 23.64 GiB of which 13.85 GiB is free. Including non-PyTorch memory, this process has 9.05 GiB memory in use. Of the allocated memory 8.58 GiB is allocated by PyTorch, and 25.99 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

putizi-super commented 4 months ago

所以这个地方还没有优化是吗?非常耗显存。

Bachstelze commented 4 months ago

How much RAM did nano-gpt use before the change? Have you tried to use the smallest setting of nano-gpt?

ybu-lxd commented 4 months ago

I also encountered this problem, have you solved it?

STQ-AmadeusUser commented 4 months ago

The same problem. My input is a tensor whose shape is 8000x36 while my kan network is: kan_net = KAN(width=[36, 32, 10], grid=5, k=3, seed=0, device=torch.device('cuda'))

Finally, the network will eat a huge number of memories.

Bachstelze commented 4 months ago

@ybu-lxd @STQ-AmadeusUser How much RAM did nano-gpt use before the change? Have you tried to use the smallest setting of nano-gpt?

You can have a look into other transformer combinations with KAN like a-r-r-o-w/kanformer and kanformers.