borchero / pycave

Traditional Machine Learning Models for Large-Scale Datasets in PyTorch.
https://pycave.borchero.com
MIT License
126 stars 13 forks source link

GPU memory leak. #55

Open epbsb opened 1 year ago

epbsb commented 1 year ago

Hello,

I'm using pycave for a project where the data is unidimensional of size 8e9. The GPU options works well, and I'm splitting the data in "for loops" to do the predictions. However, as the loops goes on, it takes more and more of the GPU memory, and eventually runs out of memory. To contour this issue, I'm using torch clean cache at each interaction in addition to the garbage collector function in python, as shown in the code below, however this process is slow.

import gc
def clear_gpu_memory():
    torch.cuda.empty_cache()
    gc.collect()
    torch.cuda.empty_cache()

I've tried to use the pycave built-in function of batches as well, but it also runs in the memory issue.

Is there anything I could do to fix this?

borchero commented 1 year ago

I haven't seen this in the past and don't currently have a GPU available for testing, unfortunately 😕

epbsb commented 1 year ago

I "fixed" the problem. When I install pycave it forces an old installation of PyTorch (1.12) with the "torchkit" dependency. After that, I reinstall the latest 2.01 version of PyTorch and there are no more memory leaks!

I can use the batch function normally!!

epbsb commented 1 year ago

@borchero Since my last message I noticed something, in the poetry.lock file you have this:

[package.dependencies] numpy = ">=1.20.0,<2.0.0" pytorch-lightning = ">=1.8,<1.13"

Which I belive forces the install of the older version of PyTorch (1.12).