Closed IvanYashchuk closed 2 weeks ago
user code shouldn’t rely on whether it’s view or not.
I'm wondering how this is perceived by users today? i.e. extreme cases are still tricky to resolve, I'm wondering with fake tensor, how reliable can torch's analysis resolve reshape-as-view vs reshape-as-copy? cc'ing @kiya00
Fake tensors do propagate the stride information but it may do it differently from Eager and then anyway we don't have stride information for the inputs so we cannot initialize correct fake tensors.
There are other ways to get the information whether PyTorch Eager produces a copy or a view reliably. I'd like the discussion here to be focused on the UX and expected behavior, not the technical challenges.
I'm wondering how this is perceived by users today?
There's no documentation in PyTorch similar to NumPy about viewing vs non-viewing reshape https://numpy.org/doc/stable/user/basics.copies.html#how-to-tell-if-the-array-is-a-view-or-a-copy so non-advanced users can be confused why they observe side-effects sometimes and sometimes not.
I didn't exclude the operators return view in the auto registration, as Ivan mentioned the stride information is not used. And now I find I didn't add the tensor view operators appropriately in the _syms_returning_runtime_dependently_views
and _syms_returning_views
(e.g.: there's torch.Tensor.reshape_as
auto registered, which has the same problem as reshape+in-place) , do you think it makes more sense if I remove them or just add them in the above mentioned 2 lists and let the corresponding passes handle them?
triage review: an error seems like a good solution for now, it will be interesting to see if this happens in practice
In the Lightning AI Slack #thunder channel @t-vi wrote:
Personally, I would error if inplace is right after reshape with a stern message educating the user.
@lantiga added:
Same I would error. Like, most correct PyTorch code will have worked its way to finding inplace on views led to a bug and the would have fixed it, so we should just error and save them the trouble.
@mruberry's comment:
I think the only options are to error or mimic PyTorch's behavior in this case. Otherwise we'd have a silent incorrectness issue
Let's raise an error then. I'll create a separate issue to track it. Thanks to everyone who participated in the discussion!
🚀 Feature
torch.reshape
(and all operations usingreshape
internally) is interesting because depending on PyTorch's internal conditions it may or may not return a tensor view. PyTorch documentation says:@jjsjann123 discovered that with the current functionalization enabled the behavior doesn't match with PyTorch silently producing unexpected results and matches when it's disabled (https://github.com/Lightning-AI/lightning-thunder/issues/657#issuecomment-2274293433).
Short-term solution: Thunder should raise an error if reshape+in-place occurs in the user script irrespective of whether the reshape is safely resolvable to be a non-viewing reshape or not. There should be an option of turning this error into a warning for developers. What do people think about this?
For a long-term solution, I see two paths:
torch.view
ortorch.reshape_copy
.I'd like people to focus on what the user experience with Thunder should be and not how more technically challenging (or not) one solution is over another. Maybe others can come up with more alternatives?
cc @tfogal, @lantiga, @t-vi, @crcrpar