NVIDIA / TransformerEngine

A library for accelerating Transformer models on NVIDIA GPUs, including using 8-bit floating point (FP8) precision on Hopper and Ada GPUs, to provide better performance with lower memory utilization in both training and inference.
https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/index.html
Apache License 2.0
1.87k stars 308 forks source link

View size incompatible with tensor's size and stride #417

Open sabetAI opened 1 year ago

sabetAI commented 1 year ago

transformer_engine.pytorch.module.layernorm.LayerNorm calls inputmat = inp.view((-1, in_features)) which throws the error below. Using reshape fixes this error. image

timmoon10 commented 1 year ago

I reproduce this error with non-contiguous data:

model = te.LayerNorm(4)
x = torch.randn([4,4,4,4], device="cuda")
x = x.contiguous(memory_format=torch.channels_last)
x.requires_grad_(True)
y = model(x)

I think it would be reasonable if the TE modules all coerced their inputs to be contiguous, to be on GPU, and to have the expected dtypes.

Even with this support, be advised that non-contiguous data should still be avoided in TE. TE kernels are mostly written for contiguous data and I've found that PyTorch's reordering kernel is slow.