Discussion on Tensor type conversions

jerry73204 commented 3 years ago

It is due to my closed #198. It is outdated for a while and I think it should be discarded too. Nevertheless, let me leave some concerns about type conversions in current implementation. Some changes would be made as soon as some concerns become clear.

Implicit floating point truncation It is able to divide a Float kind tensor with a f64 number. It involves implicit f64 to f32 truncation. We might restrict the divisor to be f32, but it sacrifices the convenience by writing tensor / 2f32 everywhere. I consider a balanced approach is that we allow implicit conversions for +-*/, and provide explicit method calls that restricts scalar types.
Deprecate From<Vec<Vec<T>>> for Tensor The std doc says From must not fail. Also, it is discouraged to build vec of vec to store higher dimensional data because it is prone to error and is less efficient, though this feature is convenient for hacking. We could follow ndarray's from_shape_vec() or provide a fallible from_guessed_vecs() that make shape guessing explicit.
Convenient Tensor constructors I wish to have from_shape_fn and from_shape_iter like ndarray. It really saves my time.
Third-party type conversion I wish it works with well-known crates without much effort. It's nice we already have ndarray. It's better it supports nalgebra and image. I see introducing them would cause bloating. It could be solved by cargo feature gating or introduce an extra crate for type conversion.
Let Tensor to be Copy (and of course Clone) I see it is a controversial decision. Let me argue that it is valid that Tensor can be Copy. My retrospection on Clone and Copy is that it affects .clone() is explicit or not. Costly copy should have explicit .clone(), while Copy is suitable for primitive or cheap reference types. IMO, libtorch's Tensor is a reference to shaped data on device (so we have .shallow_clone()). We could follow PyTorch's approach to have implicit shallow copy and explicit deep .copy() method. By having Copy on Tensor, we gain some conveniences. We can keep Add<Tensor> for Tensor and remove Add<&Tensor> for Tensor without bothering the borrow checker. It's also less cumbersome to derive Clone on struct of Tensors.

edlanglois commented 3 years ago

I don't think that Copy can be implemented for Tensor because Copy is for "types whose values can be duplicated simply by copying bits". If you did that with Tensor you'd end up with both pointing to the same object on the C++ side which would result an extra refcount subraction when the two tch Tensors go out of scope.

jerry73204 commented 3 years ago

@edlanglois I agree with your points. The cloning/coping of Tensor is correct only when the refcount is increased implicitly, if it is a shallow copy.

Another concern is that whether .clone() is shallow or deep affects the semantics of ownership. Currently, the self-modifying methods like .absolute_(&mut self) expects unique ownership. The shallow copy shares the underling data, making the ownership unsound. If libtorch knows concurrent self-modifying operations, just like many other concurrent data structures like dashmap, it should expect shared ownership.

EDIT: Let's move the discussion to #370.

jerry73204 commented 3 years ago

More thoughts about this thread.

I wrote cv-convert to support third-party type conversions from/to tensors. The issue is somewhat fixed.
The builtin array types can be a good alternative to From<Vec<Vec<T>>>. The nested vec conversion should be deprecated to fit From definition. We can provide an fallible constructor from_nested_vec() as a replacement.

LaurentMazare / tch-rs

Discussion on Tensor type conversions #281