I have been training with AIGen in Kaggle notebooks, and I'm running into an issue where CPU memory is slowly increasing, over the course of several hours. Before long, the notebook goes OOM, and training crashes.
I'm not sure where the leak is happening. I do know that it's not in VTX (it's in AIGen), and it's not leaking VRAM (it leaks system RAM). I suspect it has something to do with the streaming dataloaders (because they are the only ones I'm using here), but I haven't had the bandwidth to troubleshoot yet.
I have been training with AIGen in Kaggle notebooks, and I'm running into an issue where CPU memory is slowly increasing, over the course of several hours. Before long, the notebook goes OOM, and training crashes.
I'm not sure where the leak is happening. I do know that it's not in VTX (it's in AIGen), and it's not leaking VRAM (it leaks system RAM). I suspect it has something to do with the streaming dataloaders (because they are the only ones I'm using here), but I haven't had the bandwidth to troubleshoot yet.