SJTU-IPADS / reef

REEF is a GPU-accelerated DNN inference serving system that enables instant kernel preemption and biased concurrent execution in GPU scheduling.
Apache License 2.0
81 stars 6 forks source link

The result of the model does not match the pytorch output #5

Open husterdjx opened 1 year ago

husterdjx commented 1 year ago

hello! I'm trying to validate your example with pytorch, in which I get some problems. https://github.com/SJTU-IPADS/reef/blob/0a25de5d60edaef524752a921a8c72e131137879/src/reef/test/test.cpp#L52-L53 Here you give the result of resnet18 with a set of 10 as input, which is consistent with tvm generated workload. However, when I want to use pytorch resnet18 to verify this output, I get a different answer. The code is as follows:

if __name__ == '__main__':
    device = torch.device("cuda")
    model = torch.hub.load('pytorch/vision:v0.10.0', 'resnet18', pretrained=True)
    model.to(device)
    model.eval()

    batch_size = 1
    image_shape = (3, 224, 224)
    data_shape = (batch_size,) + image_shape
    input_data = np.ones(data_shape).astype("float32")
    input_data = input_data * 10
    print(input_data)
    input_data = torch.from_numpy(input_data)
    device = torch.device("cuda")
    input_data = input_data.to(device)

    with torch.no_grad():
        output = model(input_data)
    torch.cuda.synchronize()

    # print(output[0])
    print(output[0][0:10]) // tensor([-5.8784, -1.7790, -0.6576, -4.9093, -3.7112, -1.5704, -7.2738,  2.6429,
                                       // -0.3956, -2.5655], device='cuda:0')

In addition, when I try to use your script tvm_generate_model.py to generate densenet model in NVIDIA CUDA, the output is all zeros by using the following sample:

batch_size = 1
image_shape = (3, 224, 224)
data_shape = (batch_size,) + image_shape
out_shape = (batch_size, num_class)

mod, params = relay.testing.densenet.get_workload(
    densenet_size=169, batch_size=batch_size, image_shape=image_shape
)
opt_level = 3
target = tvm.target.cuda()

with tvm.transform.PassContext(opt_level=opt_level):
    lib = relay.build(mod, target, params=params)
    # graph_json, lib, params = relay.build(mod, target, params=params)  // if I use this style, the code that immediately follows to get the module will report an error, indicating that the module has no function 'default'

graph_json = lib.graph_json
params = lib.get_params()
ctx = tvm.gpu()
module = graph_runtime.GraphModule(lib["default"](ctx))

data = np.ones(data_shape).astype("float32")
data = data * 10
module.set_input("data", data)

module.run()

out = module.get_output(0, tvm.nd.empty(out_shape)).asnumpy()
print(out.flatten()[0:10]) //  [0. 0. 0. 0. 0. 0. 0. 0. 0. 0.]

I would be very grateful if you could reply!

francis0407 commented 1 year ago

Here you give the result of resnet18 with a set of 10 as input, which is consistent with tvm generated workload. However, when I want to use pytorch resnet18 to verify this output, I get a different answer. The code is as follows:

The model parameters used in the example are randomly generated by TVM, which are not from a trained model. See:

https://github.com/francis0407/tvm/blob/1cca7f43871e717076ad7cddb58f2cac24a5a7ff/python/tvm/relay/testing/init.py#L146

and

https://github.com/francis0407/tvm/blob/1cca7f43871e717076ad7cddb58f2cac24a5a7ff/python/tvm/relay/testing/init.py#L85