Closed MrHuff closed 3 years ago
Hi! There are a few moving parts for the mmv operations, and it's a bit more complex than it should be.
Essentially, if you look at the fmmv_cuda.py
file, and in particular the generic_fmmv
function you have a generic way of running kernel-vector multiplications. That function is responsible for splitting the data matrices and computing the matrix-vector product, but not for computing the kernel itself.
The kernel is a class which defines methods _prepare
, _apply
and _finalize
which get called to actually compute the kernel matrix (examples can be found in the falkon.kernels
submodule!)
If I understand what you want to do, you should define a new generic_fmmv
function, and possibly also the function which splits among multiple GPUs and then calls generic_fmmv
(which is in the same file and is called fmmv_cuda
). There are other versions of these functions: the distk
variants do some tiny optimizations for the Gaussian kernel, the fdmmv
variants run a "double" kernel-vector product. They could be substituted by two fmmv
calls at the cost of running more computations.
Get in touch if you want to expand on what your idea is, or if you have other doubts on the implementation!
Best, Giacomo
Dear Giacomo,
Thank you for your reply. This is very helpful. Upon fiddling around yesterday, I might be close to achieving what I want. I'l let you know if it doesn't work out.
Thank you!
Best regards, Robert
Sorted it out through a hack, made a custom kernel class with a custom FALKON conjugate gradient that makes sure to call the homemade method for all fmmv and dmmv ops.
However, will try to make PR of a proper implementation once its confirmed working!
Hi again!
Thanks again for the help last time.
This time, I'd like to replace the falkon.mmv_ops in the InCoreFalkon solver with a homemade mmv_ops for a research project.
Wondering what is the "cleanest" and simplest way to do this?
Thank you!
Best regards, Robert