LaurentMazare / tch-rs

Rust bindings for the C++ api of PyTorch.
Apache License 2.0
4.28k stars 340 forks source link

Implement Clone for Tensor? #370

Open edlanglois opened 3 years ago

edlanglois commented 3 years ago

I am interested in the reasons for / against implementing Clone for tensor. The only existing discussion I'm aware of is part of #281.

To start, it would be nice to have Clone implemented for Tensor. I'd like to be able to easily clone structures that store tensors.

There is a C++ clone function that could be used. I see that this function is in Declarations-*.yaml but it does not appear in the generated api (at least for tch-rs, it's in the api for ocaml-torch). Even if the decision is not to implement Clone for Tensor it would be nice to expose this function as g_clone or something. [Edit: Oops I forgot about Tensor::copy. Why is that done in rust with copy_ instead of with the C++ clone()?]

There is possibly an issue with whether Tensor::clone should have a gradient back to the source. From the torch perspective you would expect a clone() method to behave the same as it does in C++ and Python, which is to be differentiable with respect to the source.

From the rust side that might be unexpected: if I clone a parameter tensor then I don't want the new parameter tensor to have gradients back to the original. I'm not sure that detaching is the solution either. If you have

let x = Tensor::ones(&[0], ...).requires_grad_(true);
let y = 2 * x;
let z = y.clone();

then should z be differentiable with respect to x as though it were y or simply be detached? From a rust perspective I'd kind of expect z to be exactly like y which means being differentiable with respect to x.

But the more I think about it though the more I think it's fine for clone() to be differentiable. In the above example, dy/dx = 2 and dy/dy = 1 and z would behave the same when substituting it for y: dz/dx = 2, dz/dy = 1, dz/dz = 1. As for differentiably cloning parameter tensors I think it's unlikely that you would differentiate a function of the clone with respect to the original and if you do, the fact that C++/Pytorch clone is differentiable ought to be enough of a warning to consider that the tch version might be too.

Another risk with implementing Clone is that the new variable isn't tracked by a VarStore but I don't think that should be problem either. If you are calling clone() manually then it should be expected that the clone won't be in a VarStore. The risk would be deriving Clone for a struct that stores Tensors and has those Tensors in a VarStore except you can't do that because VarStore doesn't implement Clone.

Any other reasons for / against? At this point I am in favour of implementing Clone for Tensor using the differentiable C++ clone() function.

edlanglois commented 3 years ago

Replying to https://github.com/LaurentMazare/tch-rs/issues/281#issuecomment-855287935 from @jerry73204

Another concern is that whether .clone() is shallow or deep affects the semantics of ownership. Currently, the self-modifying methods like .absolute_(&mut self) expects unique ownership. The shallow copy shares the underling data, making the ownership unsound. If libtorch knows concurrent self-modifying operations, just like many other concurrent data structures like dashmap, it should expect shared ownership.

I agree that clone() should be a deep copy. One reason is that the python and c++ clone() methods perform a deep copy so any tch-rs clone() should do the same. Another reason is that, as a user, I think of tch::Tensor as owning the data (even though that's not quite true) so I expect clone() to copy the data. The tch-rs API doesn't really advertise the fact that tch::Tensor is essentially Rc<RefCell<Data>> (on the C++ side), it's called Tensor not TensorHandle after all, so I think it would be surprising to experience a shallow copy on clone().

As for shallow_clone(), are you sure it's unsound? I'm not an expert on rust's soundness rules or anything but in sound rust code you can have two non-mut Rc<RefCell<Data>> pointing to the same data, modify the data with one, and then it will be modified when the other reads it. I assume that on the C++ side there is some sort of lock to prevent multiple concurrent modifications. If the C++ side is something like Arc<RefCell<Mutex<Data>>> then as far as I know it ought to be fine. The RefCell checks of only-one-&mut wouldn't be there in C++ (just the inner mutability) but since the tch-rs API doesn't let you safely get a pointer to the underlying data, I don't think you can end up with multiple mut references.

jerry73204 commented 3 years ago

The saying of soundness is that unique ownership &mut self does not respect the fact that data is shared in shallow copy. It should be &self instead even for self-modifying methods. It's not wrong about implementation, it's the API that does not respect the rule.

Data sharing is common libtorch in various ways, such as Tensor::view() leads to two owned instances with shared data. It could be explicit like Arc<RefCell<_>>, or be transparent and documented in terms of ownership. The thing is the ownership model of Tensor must be clearly defined. Consider the cases:

Taking mutable pointer to data can be marked unsafe. The practical use is to implement low level operations with CUDA. An example can be found in my tch-nms (link).