csarofeen / pytorch

Tensors and Dynamic neural networks in Python with strong GPU acceleration
http://pytorch.org
Other
26 stars 7 forks source link

HuggingFace DebertaForQuestionAnswering, DebertaForMaskedLM: The tensor has a non-zero number of elements #2106

Open IvanYashchuk opened 1 year ago

IvanYashchuk commented 1 year ago

🐛 Describe the bug

RuntimeError: The tensor has a non-zero number of elements, but its data is not allocated yet. Caffe2 uses a lazy allocation, so you will need to call mutable_data() or raw_mutable_data() to actually allocate memory

is caused by this:

import torchdynamo
import torch

def forward():
    ones = torch.ops.aten.ones.default([4, 512], device = torch.device(type='cuda', index=0), pin_memory = False)
    zeros = torch.ops.aten.zeros.default([4, 512], dtype = torch.int64, device = torch.device(type='cuda', index=0), pin_memory = False)
    return (ones, zeros)

f = torchdynamo.optimize(backend="nvprims_nvfuser")(forward)
f()

Versions

torchbenchPerf branch + https://github.com/IvanYashchuk/torchdynamo/tree/nvfuser-cudagraphify

IvanYashchuk commented 1 year ago

This error is fixed with https://github.com/IvanYashchuk/torchdynamo/commit/2c091eff8f2750e051920621d3908ff25a10694e

DebertaForQuestionAnswering, DebertaForMaskedLM should be working now.

IvanYashchuk commented 1 year ago

The commit above breaks another benchmark, ElectraForCausalLM. Pushed another more specific change: https://github.com/pytorch/torchdynamo/commit/54c61696f0fc93369ce7c810d8d0cf1e5217fbc1