Open davidenunes opened 3 years ago
Hello @davidenunes , It would be great if you could work on this, and it is definitely doable, but unfortunately we do not have any documentation for these parts of the code. Basically for the PyTorch bindings, the key files are generic_red.py that implements the custom autograd class for PyTorch, and generic_red.cpp that implements the interface with the c++ code, using Pybind11. If you want to give it a try, I suggest that you concentrate on trying to translate the file generic_red.py, assuming that one can write custom autograd classes with TensorFlow more or less the same way one does with PyTorch (which I do not know at all). Now about the interface with c++, we are in fact in the process of changing substantially the way the core part of KeOps works, and possibly we may not need Pybind11 anymore in the future, so it is better to wait a few weeks for this part. Having said that, it would be definitely easier for you if we find time to write some documentation ! But this may not be super easy because it involves to explain also the core mechanism of KeOps. Otherwise we can also try to have a meeting in the coming weeks ?
Thank you, that was precisely the direction I was looking for. Yes I understand, I was just wandering if you had any development documentation effort in progress.
To give you context, I'm interested in the project because I made a high level library for Tensorflow that among other things, has some components like optimal-transport-based losses with a heavy memory footprint, or neareast neighbour search kind of operations that require a third party lib. It would be nice to be able to rely on keops for some things. As I said, my time is limited (I'm wrapping-up my dissertation) but since I have some experience with TF, I'll take a look at generic_red.py
and see what I can come up with. TensorFlow custom gradients are slightly different, but not that different, I think the hard part is to understand how things are connected to the core of KeOps as you said.
So, translating the generic_red.py
to integrate keops in the Tensorflow autograd engine is fairly straightforward
but, unless I'm missing something, it still requires me to also translate generic_red.cpp
and there are plenty of
other components that are coupled with torch
.
This particular this connects many things in the background
myconv = LoadKeOps(
formula=self.formula,
aliases=self.aliases,
dtype=self.dtype,
lang="torch",
optional_flags=self.optional_flags
).import_module()
Since myconv
is is then called in the forward and backward passes, I need to be able to add
a new lang
. I also found this to be connected to the cmake scripts, but at this point I don't have enough documentation
to understand what's going on in detail.
other things like get_tag_backend
are coupled with numpy
or pytorch
because there are checks to see if the inputs are
numpy arrays or pytorch tensors.
At this point I think I need to wait for development documentation and for things to evolve internally with keops (since you mentioned the cpp interfance changes, etc). This is too big a puzzle to jump in blindly since I don't have the time to study the codebase to learn everything needed from scratch.
Out of curiosity, has any progress been made in development documentation since last year? I would be glad to make some bindings for Tensorflow provided I have some pointers to what I would need to do to extend it.
I was wondering if there's any documentation on how one would go about writing new bindings for something like Tensorflow.
I didn't find anything, but at a glance (from quickly scanning the codebase) it seems like both numpy and torch folders in pykeops provide a starting point for how one would go about it.
That said I was wondering if you had something more solid in terms of documenation even if very basic. My time is limited but I might be interested in making a contribution, depending on how doable this is.