Runtime differences and errors with different ONNX models

cherrywoods commented 3 years ago

Dear ERAN Developers, I have recently encountered some errors and behaviour I can not explain related to different ONNX exports of the same pytorch network. The network is the the ACAS Xu 2,1 network and the different ONNX variants were created like this:

torch.onnx.export(net, (torch.zeros((1, 5), )) [Export with Batch dimension]
torch.onnx.export(net, (torch.zeros((5, ), )) [Export without a batch dimension]
A network that performs an unsqueeze operation to create a batch dimension for inputs without a batch dimension. Also exported with torch.onnx.export(net, (torch.zeros((5, ), )). [With unsqueeze]

What I am encountering is that the network with the Batch dimension and the one with the unsqueeze operation have widely differing runtimes. The call of

python . --specnumber 2 --domain deeppoly --dataset acasxu --complete True --netname ../onnx_networks/acasxu_2_1_batch.onnx

takes 176 seconds on my machine to find a property violation. The call of however

python . --specnumber 2 --domain deeppoly --dataset acasxu --complete True --netname ../onnx_networks/acasxu_2_1_unsqueeze.onnx

takes only 13 seconds to find a property violation.

The call

python . --specnumber 2 --domain deeppoly --dataset acasxu --complete True --netname ../onnx_networks/acasxu_2_1_no_batch.onnx

crashes with a segmentation fault.

All three were exported with pytorch (latest version, 1.8.0), so I think it would be nice if all three would work fine, but I am mostly wondering about the runtime difference for the networks that only differ in the batch dimension and one additional operation that is as far as I could see simply ignored by the onnx translator.

I am attaching the networks I used and will add a printed version of the models below as well. The structure of the network without a batch dimension (no unsqueeze) is simply a bit weird. I think it is only possible at all to run networks with inputs without a batch dimension with the latest pytorch (1.8.0), so maybe this onnx export has not yet really been considered. I used a later version of onnx and onnxruntime than given in the requirements.txt file, because 1.5.0 does not support the ONNX opset version that pytorch uses by default. I have the latest versions of ERAN, ELINA and the other dependencies.

Kind regards, David

The networks I used: onnx_networks.zip

The networks as printed by the ONNX runtime (and the ACAS Xu 2,1 network from ERAN for comparison)

# network from ERAN: data/acasxu/nets
graph nnet2onnx_Model (
  %X[FLOAT, 5]
) initializers (
  %W0[FLOAT, 50x5]
  %B0[FLOAT, 50]
  %W1[FLOAT, 50x50]
  %B1[FLOAT, 50]
  %W2[FLOAT, 50x50]
  %B2[FLOAT, 50]
  %W3[FLOAT, 50x50]
  %B3[FLOAT, 50]
  %W4[FLOAT, 50x50]
  %B4[FLOAT, 50]
  %W5[FLOAT, 50x50]
  %B5[FLOAT, 50]
  %W6[FLOAT, 5x50]
  %B6[FLOAT, 5]
) {
  %M0 = MatMul(%W0, %X)
  %H0 = Add(%M0, %B0)
  %R0 = Relu(%H0)
  %M1 = MatMul(%W1, %R0)
  %H1 = Add(%M1, %B1)
  %R1 = Relu(%H1)
  %M2 = MatMul(%W2, %R1)
  %H2 = Add(%M2, %B2)
  %R2 = Relu(%H2)
  %M3 = MatMul(%W3, %R2)
  %H3 = Add(%M3, %B3)
  %R3 = Relu(%H3)
  %M4 = MatMul(%W4, %R3)
  %H4 = Add(%M4, %B4)
  %R4 = Relu(%H4)
  %M5 = MatMul(%W5, %R4)
  %H5 = Add(%M5, %B5)
  %R5 = Relu(%H5)
  %M6 = MatMul(%W6, %R5)
  %y_out = Add(%M6, %B6)
  return %y_out
}
# batch dimension network
graph torch-jit-export (
  %input.1[FLOAT, 1x5]
) initializers (
  %0.weight[FLOAT, 50x5]
  %0.bias[FLOAT, 50]
  %2.weight[FLOAT, 50x50]
  %2.bias[FLOAT, 50]
  %4.weight[FLOAT, 50x50]
  %4.bias[FLOAT, 50]
  %6.weight[FLOAT, 50x50]
  %6.bias[FLOAT, 50]
  %8.weight[FLOAT, 50x50]
  %8.bias[FLOAT, 50]
  %10.weight[FLOAT, 50x50]
  %10.bias[FLOAT, 50]
  %12.weight[FLOAT, 5x50]
  %12.bias[FLOAT, 5]
) {
  %15 = Gemm[alpha = 1, beta = 1, transB = 1](%input.1, %0.weight, %0.bias)
  %16 = Relu(%15)
  %17 = Gemm[alpha = 1, beta = 1, transB = 1](%16, %2.weight, %2.bias)
  %18 = Relu(%17)
  %19 = Gemm[alpha = 1, beta = 1, transB = 1](%18, %4.weight, %4.bias)
  %20 = Relu(%19)
  %21 = Gemm[alpha = 1, beta = 1, transB = 1](%20, %6.weight, %6.bias)
  %22 = Relu(%21)
  %23 = Gemm[alpha = 1, beta = 1, transB = 1](%22, %8.weight, %8.bias)
  %24 = Relu(%23)
  %25 = Gemm[alpha = 1, beta = 1, transB = 1](%24, %10.weight, %10.bias)
  %26 = Relu(%25)
  %27 = Gemm[alpha = 1, beta = 1, transB = 1](%26, %12.weight, %12.bias)
  return %27
}
# network without a batch dimension
graph torch-jit-export (
  %input.1[FLOAT, 5]
) initializers (
  %0.bias[FLOAT, 50]
  %2.bias[FLOAT, 50]
  %4.bias[FLOAT, 50]
  %6.bias[FLOAT, 50]
  %8.bias[FLOAT, 50]
  %10.bias[FLOAT, 50]
  %12.bias[FLOAT, 5]
  %42[FLOAT, 5x50]
  %43[FLOAT, 50x50]
  %44[FLOAT, 50x50]
  %45[FLOAT, 50x50]
  %46[FLOAT, 50x50]
  %47[FLOAT, 50x50]
  %48[FLOAT, 50x5]
) {
  %16 = MatMul(%input.1, %42)
  %17 = Add(%16, %0.bias)
  %18 = Relu(%17)
  %20 = MatMul(%18, %43)
  %21 = Add(%20, %2.bias)
  %22 = Relu(%21)
  %24 = MatMul(%22, %44)
  %25 = Add(%24, %4.bias)
  %26 = Relu(%25)
  %28 = MatMul(%26, %45)
  %29 = Add(%28, %6.bias)
  %30 = Relu(%29)
  %32 = MatMul(%30, %46)
  %33 = Add(%32, %8.bias)
  %34 = Relu(%33)
  %36 = MatMul(%34, %47)
  %37 = Add(%36, %10.bias)
  %38 = Relu(%37)
  %40 = MatMul(%38, %48)
  %41 = Add(%40, %12.bias)
  return %41
}
# the network that is called without a batch dimension, but the input is unsqueezed and hence the network it self is run with a batch dimension
graph torch-jit-export (
  %inputs[FLOAT, 5]
) initializers (
  %0.weight[FLOAT, 50x5]
  %0.bias[FLOAT, 50]
  %2.weight[FLOAT, 50x50]
  %2.bias[FLOAT, 50]
  %4.weight[FLOAT, 50x50]
  %4.bias[FLOAT, 50]
  %6.weight[FLOAT, 50x50]
  %6.bias[FLOAT, 50]
  %8.weight[FLOAT, 50x50]
  %8.bias[FLOAT, 50]
  %10.weight[FLOAT, 50x50]
  %10.bias[FLOAT, 50]
  %12.weight[FLOAT, 5x50]
  %12.bias[FLOAT, 5]
) {
  %15 = Unsqueeze[axes = [0]](%inputs)
  %16 = Gemm[alpha = 1, beta = 1, transB = 1](%15, %0.weight, %0.bias)
  %17 = Relu(%16)
  %18 = Gemm[alpha = 1, beta = 1, transB = 1](%17, %2.weight, %2.bias)
  %19 = Relu(%18)
  %20 = Gemm[alpha = 1, beta = 1, transB = 1](%19, %4.weight, %4.bias)
  %21 = Relu(%20)
  %22 = Gemm[alpha = 1, beta = 1, transB = 1](%21, %6.weight, %6.bias)
  %23 = Relu(%22)
  %24 = Gemm[alpha = 1, beta = 1, transB = 1](%23, %8.weight, %8.bias)
  %25 = Relu(%24)
  %26 = Gemm[alpha = 1, beta = 1, transB = 1](%25, %10.weight, %10.bias)
  %27 = Relu(%26)
  %28 = Gemm[alpha = 1, beta = 1, transB = 1](%27, %12.weight, %12.bias)
  return %28
}

The package versions I used:

python==3.6
numpy==1.17.2
onnx==1.8.1
onnxruntime==1.4.0
Pillow==8.2.0
pycddlib==2.1.4
tensorflow==1.13.2
torch==1.8.1
torchvision==0.9.1

Tensorflow is on such a low version, because I encountered an error with the latest one. A am also attaching a requirements.txt file for all the installed packages.

mnmueller commented 3 years ago

Hello @cherrywoods,

Thanks for your interest in ERAN and providing these interesting test cases.

I have homogenized how the networks with different input dimensions are treated when using the recursive refinement of input regions in the parallel mode ACAS Xu verification, which now leads to consistently fast (about 2.5s for me) falsification of the two networks/properties.

I also updated our translation of MatMul nodes to handle the special case occurring for the no_batch network and now it behaves consistently with the other two.

Cheers, Mark

cherrywoods commented 3 years ago

Hello Mark, excellent, thank you very much!

Best regards, David

eth-sri / eran

Runtime differences and errors with different ONNX models #65