Open shino16 opened 2 months ago
@crcrpar - Question from Ivan: what should be the correct behaviour?
Embarrassingly I don't remember for what and why I exposed copy_ to thunder.torch
.
Even I last weak opened https://github.com/Lightning-AI/lightning-thunder/pull/1209
The above results in the following error from nvFuser.
Fusion definition
```python # CUDA devices: # 0: NVIDIA RTX 6000 Ada Generation # torch version: 2.5.0a0+gitda32021 # cuda version: 12.6 # nvfuser version: 0.2.10+gitc3f8037 import torch from nvfuser import FusionDefinition, DataType def nvfuser_fusion_id0(fd : FusionDefinition) -> None : T0 = fd.define_tensor(shape=[], contiguity=[], dtype=DataType.Float, is_cpu=False) S1 = fd.define_scalar(1.00000, dtype=DataType.Double) T2 = fd.ops.add(T0, S1) T3 = fd.ops.sin(T2) T4 = fd.ops.set(T2) fd.add_output(T4, T0) T5 = fd.ops.set(T3) fd.add_output(T5, T2) fd.add_output(T0) fd.add_output(T2) with FusionDefinition() as fd: nvfuser_fusion_id0(fd) ```I am not sure about how to interpret nvFuser's error message, but the problem would be trying to write the output of
fd.ops.sin
onto the output offd.ops.add
.When we use
x.neg()
instead ofx.sin()
, the nvFuser executor somehow orders the copy ontot0
before the one fromt0
, and gets flagged as unsafe by_inplace_copy_sanity_check
.Python source, Execution trace
```py @thunder.jit def f(x): x.add_(1) return x.copy_(x.neg()) ``` ```py def computation(x): # x: "cuda:0 f32[]" [t3, t1] = nvFusion0(x) # t0 = prims.add(x, 1.0) # t0: "cuda:0 f32[]" # t2 = prims.neg(t0) # t2: "cuda:0 f32[]" # t3 = prims.copy_(t2, t0) # t3: "cuda:0 f32[]" # t1 = prims.copy_(t0, x) # t1: "cuda:0 f32[]" del x return {'output': t3, 'flat_args': [t1]} ```I presume this will be fixed by functionalizing
Tensor.copy_
like other in-place ops, but doing so appropriately would involve somewhat big changes inthunder/core/functionalization.py