apache / tvm

Open deep learning compiler stack for cpu, gpu and specialized accelerators
https://tvm.apache.org/
Apache License 2.0
11.57k stars 3.43k forks source link

[Bug] [PyTorch][Frontend] aten::index with boolean mask gets wrong result with argwhere #15588

Open yixunzhang opened 1 year ago

yixunzhang commented 1 year ago

aten::index doesn't work properly with a boolean mask when the boolean mask is in the type of tvm.relay.Call.

indices_list.append(_op.squeeze(_op.transform.argwhere(inp), axis=[1]))

the indices produced by argwhere and squeeze infer the wrong shape of the _op.advindex.

For example, given a tensor data with shape (1, 5053, 1, 3798) and a boolean mask valid_mask with shape (1, 5053)

Expected behavior

The expected shape of data[valid_mask] should be inferred as (?, 1, 3798)

Same as the behavior of data[valid_mask] in Pytorch

Same as the behavior of data[torch.squeeze(torch.argwhere(valid_mask), [1]).tolist()] in Pytorch

Actual behavior

The actual shape is inferred to be (?, 2, 5033, 1, 3798)

Same as the behavior of data[torch.squeeze(torch.argwhere(valid_mask), [1])]

Environment

PyTorch 1.13 tvm >= 1.11.0

Steps to reproduce

The example for PyTorch, just for referring to the similar behavior

# in PyTorch 2.0 because `argwhere` is available till this version 
data = torch.rand(1, 5033, 1, 3798).cuda()
valid_mask = torch.rand(1, 5033).cuda() > 0.3
masked_data0 = data[valid_mask]
masked_data1 = data[torch.squeeze(torch.argwhere(valid_mask), [1]).tolist()]
print("shape of data[valid_mask]", masked_data0.shape)
print("shape of data[torch.squeeze(torch.argwhere(valid_mask), [1]).tolist()]”,masked_data1.shape)
print("shape of data[torch.squeeze(torch.argwhere(valid_mask), [1])]”,masked_data1.shape)

The example for tvm frontend.

import torch
import torch.nn as nn
import tvm.relay as relay

class Demo(nn.Module):
    def __init__(self):
        super().__init__()
    def forward(self, x, y):
        x = torch.clamp(x, 0.2, 0.8)
        mask = y[:, :, 0, 0]
        input = x[mask]
        ret = torch.squeeze(input, 1)
        return ret
if __name__ == "__main__":
    shape_list = [("x", (1, 5033, 1, 3798)), ("y", (1, 5033, 1, 1))]
    model = DemoNet().cuda().eval()
    inputs = []
    inputs.append(torch.rand(shape_list[0][1]).cuda())
    inputs.append(torch.rand(shape_list[1][1]).cuda()>0.3)

Add some log in function convert_operators in python/tvm/relay/frontend/pytorch.py

def convert_operators(self, operators, outputs, ret_names):
    for node_name, op_node in operators:
         print(op_node, flush=True)
         # ......
             relay_out = relay_op(....)
             print(f"relay_out shape: {self.infer_shape(relay_out)}", flush=True)

The the shape of relay_out ((?, 2, 5033, 1, 3798)) will be shown in the log.

Triage

cc @shingjan @yelite

zyc-bit commented 11 months ago

Hi, have you solved this problem? when do data[valid_mask] = value_tensor (index_put) in pytorch, it will use scatter_nd in tvm. I also meet similiar problem.