View size incompatible with tensor's size and stride

NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.

Apache License 2.0

1.87k stars 308 forks source link

I reproduce this error with non-contiguous data:

model = te.LayerNorm(4)
x = torch.randn([4,4,4,4], device="cuda")
x = x.contiguous(memory_format=torch.channels_last)
x.requires_grad_(True)
y = model(x)

I think it would be reasonable if the TE modules all coerced their inputs to be contiguous, to be on GPU, and to have the expected dtypes.

Even with this support, be advised that non-contiguous data should still be avoided in TE. TE kernels are mostly written for contiguous data and I've found that PyTorch's reordering kernel is slow.

NVIDIA / TransformerEngine

View size incompatible with tensor's size and stride #417