FalkonML / falkon

Large-scale, multi-GPU capable, kernel solver
https://falkonml.github.io/falkon/
MIT License
181 stars 22 forks source link

Get Out-of-memory when selecting m=10^5 #34

Closed MrHuff closed 3 years ago

MrHuff commented 3 years ago

Hi again,

And thanks for the help last time again, will as mentioned try to make a PR as soon we confirm our experiment/method it working. I get:

RuntimeError: cuda runtime error (2) : out of memory at /opt/conda/conda-bld/pytorch_1616554786529/work/aten/src/THC/THCCachingHostAllocator.cpp:278 when I try to set M=10^5 (I am on a laptop). Should I be getting this error and can I expect it to work if I am using V100 with 32GB?

Thank you!

Best regards, Robert

Giodiro commented 3 years ago

Hi, you are running out of CPU RAM. 10^5 centers means you need to store a 10^5 x 10^5 matrix, which assuming float32 data is around 10GB (double that for float64 data). You obviously have to add all the other things you have in RAM such as the data itself. How much RAM do you have on your laptop, and what is the full stacktrace?

MrHuff commented 3 years ago

Hi, thanks for the prompt reply!

I have around 20GB on my laptop, so it does make sense for it not to work. Isn't a 10^5 square matrix of floats 40GB since we have 10^10 floats and each float is 4bytes? If so, we will make sure to get a larger machine for our experiments.

But more generally, it's safe to say that this MxM matrix needs to be fully allocated since FALKON requires it to be inversed?

Thank you for the help.

Giodiro commented 3 years ago

Yes, you are absolutely right about the 40GB, and about the MxM matrix!