Open IvanYashchuk opened 1 year ago
This is likely related to the recent contiguity cleanup by @zasdfgbnm.
I think it is an issue of the repro.
The definition of contiguity has changed a few hours ago. Now broadcast dimensions do not have a contiguity and for a non-broadcast dimension, it is contiguous if and only if its stride equals stride[i] * shape[i]
, where i
is the next non-broadcast dimension. For example, if I have a tensor shape [8, 10, 5], stride [5, 0, 1]
, in the past, the contiguity is [False, False, True]
, but now it is [True, True]
. This new definition allows us to have larger vectorize size (in this example, we can now vectorize 8, but in the past, we can not vectorize). In this repro, the second dimension of T0
is broadcast, so the contiguity should be [True]
.
My question for @IvanYashchuk is:
T0
come from, is it manual, or it comes from some code?FYI: this is the PR that changes the definition of contiguity: https://github.com/csarofeen/pytorch/pull/2517
Where does the contiguity of T0 come from, is it manual, or it comes from some code? If it is from a code, is it easy to change that code?
It's generated from code, and we can change it. Thank you for clarifying what's going on, changing T0's contiguity to [True]
works.
We should
version.txt
https://github.com/csarofeen/pytorch/blob/devel/third_party/nvfuser/version.txtdefine_tensor
to check provided contiguity and raise an error thereProbably versioning is not strictly needed, the code generator should use the compute_contiguity
function, but probably it doesn't.
Alright here's the real problem, the code generator actually uses a different overload of define_tensor
with explicit strides=
:
from nvfuser import FusionDefinition, DataType
import torch
a = torch.ones(4, 1, dtype=torch.double, device='cuda')
b = torch.ones(4, 4, dtype=torch.double, device='cuda')
def nvfuser_fusion_id2(fd : FusionDefinition) -> None :
T0 = fd.define_tensor(sizes=a.shape, strides=a.stride(), dtype=DataType.Double, is_cpu=False)
T1 = fd.define_tensor(sizes=b.shape, strides=b.stride(), dtype=DataType.Double, is_cpu=False)
T2 = fd.ops.broadcast_in_dim(T0, output_shape=[4, 4], broadcast_dims=[0, 1])
T3 = fd.ops.div(T1, T2)
fd.add_output(T3)
# RuntimeError: The size of contiguity must equal to the number of non-broadcasting IterDomains
with FusionDefinition() as fd:
nvfuser_fusion_id2(fd)
print(fd.execute([a, b]))
🐛 Describe the bug
Versions
devel