Closed gabrielfougeron closed 1 year ago
Hi @gabrielfougeron ,
I struggle in understanding the exemple code available at https://www.kernel-operations.io/keops/_auto_tutorials/backends/plot_scipy.html .
On the one hand, I see only numpy arrays being defined. Torch is not even imported.
KeOps can be indeed used with numpy arrays alone (ie we do not rely on pytorch to perform computation on GPU).
On the other hand, I can tell that the computation is being performed on the GPU (as nvidia-smi attests).
That's a good news :)
How is it possible?
No black magic. LazyTensor are just wrapper around a Tensor like classes : it may be a numpy arrays or torch arrays independently. It could be a tensorFlow tensor structure (but this is not implemented)
When are the memory transfers happening?
At the very last moment : the memory transfers between "cpu memory" and "gpu memory" (ie cudaMemCpy) are triggered when doing a large reduction on a LazyTensor. For instance in this command
D = K @ np.ones(N, dtype=dtype) # Sum along the lines of the adjacency matrix
or when eigsh
call the aslinearoprator K
.
How can I get more fine-grained control over this?
We worked hard so the user do not have to care about this... Maybe this point can be improved though...
For instance, when using a torch.tensor
already in stored in the gpu memory, no copy is needed (we just need the pointer to the tensor data). In case of numpy array, the data is on cpu memory, so I think that a copy is done each time a reduction is performed. @joanglaunes can you confirm that ?
Thank you very much @bcharlier for your detailed answer. It challenged my preconceived idea of what happens under the hood when pykeops is running.
Hi,
I struggle in understanding the exemple code available at https://www.kernel-operations.io/keops/_auto_tutorials/backends/plot_scipy.html .
On the one hand, I see only numpy arrays being defined. Torch is not even imported. On the other hand, I can tell that the computation is being performed on the GPU (as nvidia-smi attests).
How is it possible? When are the memory transfers happening? How can I get more fine-grained control over this?