PR#479 caused different results on CPU and GPU targets

scxiao commented 4 years ago

PR#479 Parsing onnx file topologically caused different results on CPU and GPU for some models: opset7/tf_inception_v3 opset8/tf_inception_v2 opset8/tf_nasnet_large

scxiao commented 4 years ago

For the model opset7/tf_inception_v3, looks like the order of the operator impact the results. This order generates correct results:

@408 = pooling[mode=max,padding={0, 0},padding_mode=0,stride={2, 2},lengths={3, 3}](@407) -> float_type, {1, 192, 35, 35}, {235200, 1225, 35, 1}
@409 = convolution[padding={0, 0},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@408,@37) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@410 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@409,@23,@34,@35,@36) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@411 = relu(@410) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@412 = convolution[padding={0, 0},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@408,@41) -> float_type, {1, 48, 35, 35}, {58800, 1225, 35, 1}
@413 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@412,@24,@38,@39,@40) -> float_type, {1, 48, 35, 35}, {58800, 1225, 35, 1}
@414 = relu(@413) -> float_type, {1, 48, 35, 35}, {58800, 1225, 35, 1}
@415 = convolution[padding={2, 2},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@414,@45) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@416 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@415,@23,@42,@43,@44) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@417 = relu(@416) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@418 = convolution[padding={0, 0},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@408,@49) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@419 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@418,@23,@46,@47,@48) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@420 = relu(@419) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@421 = convolution[padding={1, 1},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@420,@53) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@422 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@421,@25,@50,@51,@52) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@423 = relu(@422) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@424 = convolution[padding={1, 1},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@423,@57) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@425 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@424,@25,@54,@55,@56) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@426 = relu(@425) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@427 = pooling[mode=average,padding={1, 1},padding_mode=0,stride={1, 1},lengths={3, 3}](@408) -> float_type, {1, 192, 35, 35}, {235200, 1225, 35, 1}
@428 = convolution[padding={0, 0},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@427,@61) -> float_type, {1, 32, 35, 35}, {39200, 1225, 35, 1}
@429 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@428,@21,@58,@59,@60) -> float_type, {1, 32, 35, 35}, {39200, 1225, 35, 1}
@430 = relu(@429) -> float_type, {1, 32, 35, 35}, {39200, 1225, 35, 1}
@431 = concat[axis=1](@411,@417,@426,@430) -> float_type, {1, 256, 35, 35}, {313600, 1225, 35, 1}

This order generates incorrect results:

@408 = pooling[mode=max,padding={0, 0},padding_mode=0,stride={2, 2},lengths={3, 3}](@407) -> float_type, {1, 192, 35, 35}, {235200, 1225, 35, 1}
@409 = pooling[mode=average,padding={1, 1},padding_mode=0,stride={1, 1},lengths={3, 3}](@408) -> float_type, {1, 192, 35, 35}, {235200, 1225, 35, 1}
@410 = convolution[padding={0, 0},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@409,@61) -> float_type, {1, 32, 35, 35}, {39200, 1225, 35, 1}
@411 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@410,@21,@58,@59,@60) -> float_type, {1, 32, 35, 35}, {39200, 1225, 35, 1}
@412 = relu(@411) -> float_type, {1, 32, 35, 35}, {39200, 1225, 35, 1}
@413 = convolution[padding={0, 0},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@408,@49) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@414 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@413,@23,@46,@47,@48) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@415 = relu(@414) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@416 = convolution[padding={1, 1},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@415,@53) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@417 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@416,@25,@50,@51,@52) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@418 = relu(@417) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@419 = convolution[padding={1, 1},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@418,@57) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@420 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@419,@25,@54,@55,@56) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@421 = relu(@420) -> float_type, {1, 96, 35, 35}, {117600, 1225, 35, 1}
@422 = convolution[padding={0, 0},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@408,@41) -> float_type, {1, 48, 35, 35}, {58800, 1225, 35, 1}
@423 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@422,@24,@38,@39,@40) -> float_type, {1, 48, 35, 35}, {58800, 1225, 35, 1}
@424 = relu(@423) -> float_type, {1, 48, 35, 35}, {58800, 1225, 35, 1}
@425 = convolution[padding={2, 2},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@424,@45) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@426 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@425,@23,@42,@43,@44) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@427 = relu(@426) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@428 = convolution[padding={0, 0},stride={1, 1},dilation={1, 1},padding_mode=0,group=1](@408,@37) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@429 = batch_norm_inference[epsilon=0.001,momentum=0.9,bn_mode=1](@428,@23,@34,@35,@36) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@430 = relu(@429) -> float_type, {1, 64, 35, 35}, {78400, 1225, 35, 1}
@431 = concat[axis=1](@430,@427,@421,@412) -> float_type, {1, 256, 35, 35}, {313600, 1225, 35, 1}

scxiao commented 4 years ago

Theoretically, they should generate the same results.

scxiao commented 4 years ago

This may also be related to Issue#465 for the Opset 8 fp16_tiny_yolov model. We have several fp16 models that generate incorrect result on onnxruntime.

scxiao commented 4 years ago

Removing the rewrite_batchnorm pass will also generate different results for the above scenario (both are incorrect)

ROCm / AMDMIGraphX

PR#479 caused different results on CPU and GPU targets #501