FalkonML / falkon

Large-scale, multi-GPU capable, kernel solver
https://falkonml.github.io/falkon/
MIT License
181 stars 22 forks source link

Replace falkon.mmv_ops with homemade mmv_ops #33

Closed MrHuff closed 3 years ago

MrHuff commented 3 years ago

Hi again!

Thanks again for the help last time.

This time, I'd like to replace the falkon.mmv_ops in the InCoreFalkon solver with a homemade mmv_ops for a research project.

Wondering what is the "cleanest" and simplest way to do this?

Thank you!

Best regards, Robert

Giodiro commented 3 years ago

Hi! There are a few moving parts for the mmv operations, and it's a bit more complex than it should be.

Essentially, if you look at the fmmv_cuda.py file, and in particular the generic_fmmv function you have a generic way of running kernel-vector multiplications. That function is responsible for splitting the data matrices and computing the matrix-vector product, but not for computing the kernel itself. The kernel is a class which defines methods _prepare, _apply and _finalize which get called to actually compute the kernel matrix (examples can be found in the falkon.kernels submodule!)

If I understand what you want to do, you should define a new generic_fmmv function, and possibly also the function which splits among multiple GPUs and then calls generic_fmmv (which is in the same file and is called fmmv_cuda). There are other versions of these functions: the distk variants do some tiny optimizations for the Gaussian kernel, the fdmmv variants run a "double" kernel-vector product. They could be substituted by two fmmv calls at the cost of running more computations.

Get in touch if you want to expand on what your idea is, or if you have other doubts on the implementation!

Best, Giacomo

MrHuff commented 3 years ago

Dear Giacomo,

Thank you for your reply. This is very helpful. Upon fiddling around yesterday, I might be close to achieving what I want. I'l let you know if it doesn't work out.

Thank you!

Best regards, Robert

MrHuff commented 3 years ago

Sorted it out through a hack, made a custom kernel class with a custom FALKON conjugate gradient that makes sure to call the homemade method for all fmmv and dmmv ops.

However, will try to make PR of a proper implementation once its confirmed working!