Closed kevinstephano closed 1 week ago
Slightly smaller repro:
import torch
from nvfuser import FusionDefinition, DataType
def nvfuser_fusion_id0(fd : FusionDefinition) -> None :
T0 = fd.define_tensor(shape=[128000, 1024], contiguity=[True, True], dtype=DataType.BFloat16, is_cpu=False, stride_order=[1, 0])
T5 = fd.ops.reshape(T0, new_shape=[128000, 8, 128])
T6 = fd.ops.permute(T5, dims=[1, 0, 2])
S7 = fd.define_scalar(8, dtype=DataType.Int)
S8 = fd.define_scalar(4, dtype=DataType.Int)
S9 = fd.define_scalar(128000, dtype=DataType.Int)
S10 = fd.define_scalar(128, dtype=DataType.Int)
T12 = fd.ops.broadcast_in_dim(T6, shape=[S7, S8, S9, S10], broadcast_dims=[0, 2, 3])
T17 = fd.ops.reshape(T12, new_shape=[32, 128000, 128])
fd.add_output(T17)
with FusionDefinition() as fd:
nvfuser_fusion_id0(fd)
inputs = [
torch.testing.make_tensor((128000, 1024), dtype=torch.bfloat16, device='cuda:0'),
]
fd.execute(inputs)
T12
has an expanded broadcast to size 4. Then we reshape from (8, 4, 128000, 128) to (32, 128000, 128) which just merges that expanded dim in with the size 8 dim.
Can we selectively enable the ID-Model for this case?
Any progress on this issue?
The first attempt wasn't successful (#3317). Will try a different WAR.
Error message:
Repro: