About "Cuda out of Memory" for "EigenWorms" dataset

Hi, the ConvTran method is great, and i have been doing experiments based on it. During my experiments, i have encounter a problem about "CUDA out of memory". Especially for "EigenWorms" dataset which has a large length of 17984, when excuting your source code to line75 in attention.py as depicted blow: the dimension scale of q and k both are (16, 8, 17984, 2), and attn is (16, 8, 17984, 17984) respectively. In term of GPU memory, attn variable(float32) will take 16×8×17984×17984×32/4/1024/1024/1024=154.22G, which is surly a large space complexity. While in the paper, there was "All of our experiments were conducted using the PyTorch framework in Python on a computing system consisting of a single Nvidia A5000 GPU with 24GB of memory and an Intel(R) Core(TM) i9-10900K CPU.", so i wonder how to solve the problem above based on 24G GPU memory. Look forward to your reply!

Navidfoumani / ConvTran

About "Cuda out of Memory" for "EigenWorms" dataset #5