Interoperability with npy-ocaml, ocaml-rs or tch-rs

LaurentMazare / ocaml-torch

OCaml bindings for PyTorch

Apache License 2.0

412 stars 38 forks source link

Interoperability with npy-ocaml, ocaml-rs or tch-rs #47

Closed crackcomm closed 4 years ago

crackcomm commented 4 years ago

Hey, I have a code in Rust that creates ndarray::Array2 converts it into_pyarray and returns from rust code to Python code and it can be used as a torch tensor.

I am wondering, what would be a way I'd do the same in OCaml?

LaurentMazare commented 4 years ago

tch-rs and ocaml-torch share some part of their codebase, in particular tensor storage is common. So you may want to write your tensor from rust via save or save_multi and you should then be able to load the tensor from ocaml via Serialize.load_multi. Note that the format for VarStore should also be common.

Finally if you write some npy file, there is indeed some npy-ocaml intergration: bin/tensor_tools.exe in this repo has a npz-to-pytorch command that should do the trick.

crackcomm commented 4 years ago

I am just exploring Bigarray and npy-ocaml for reading arrays from files.

In particular, I'm interested in using no serialization and perfectly no memcpy. From my understanding ndarray, numpy and tensor have the same memory layout so it does provide real interoperability. If there are copies in the background they are fast because the layout is same or similar.

Could I return tch-rs tensor from ocaml-rs binding to pass memory to use in ocaml-torch?

LaurentMazare commented 4 years ago

I wouldn't know of an easy way to do this with the current tch-rs and ocaml-torch api but I haven't digged much into this. If you're ok with doing some copies along the way that should be easy though.

crackcomm commented 4 years ago

I found ocaml-rs has a bigarray module which implements Bigarray.Array1 . I will implement Array2 and Array3 conversion for Rust ndarray. From there it should be just Tensor.of_bigarray. I thought it might be an issue for ocaml-rs after all. I will close this issue after posting a link to PR. It wouldn't happen without your help @LaurentMazare, just entering the beautiful world of OCaml, thank you.

crackcomm commented 4 years ago

@LaurentMazare is it expected that ocaml-torch can be 5x slower than python bindings in simple Tensor.(add (mm column column) column) vs torch.add(torch.matmul(x, x), x)?

Edit: I'm running ocaml-variants.4.10.0+fp+flambda on Ubuntu through WSL2.

LaurentMazare commented 4 years ago

It's certainly not expected as the same C++ library should be used in both cases and if the matrix is large most of the time should be spent in the C++ library rather than in ocaml or python. Maybe your python install differs from your ocaml one, e.g. you have gpu support in python and not in ocaml, or the python version uses mkl as a blas library and this isn't the case on the ocaml side.

crackcomm commented 4 years ago

I created a PR for Bigarray.Array2 in ocaml-rs.

Edit: It's merged.