Closed xsacha closed 2 years ago
Unfortunately this does not convert to the EdgeTPU as the op now appears as PadV2 (unsupported). There must be a better way.
I manually hacked in a ZeroPad2d and implemented the op_pad functionality using the Keras Zero Pad (as it was in MaxPool) and managed to get an accurate model that converted all OPs successfully to Edge TPU. I'm not how to do this in the correct way in onnx-to-keras.
What I did that actually works on the EdgeTPU:
Manually edited the torchscript to remove padding from all max_pool2d and insert a constant_pad_nd before it.
input20 = torch.max_pool2d(input10, [3, 3], [2, 2], 1)
input10pad = torch.constant_pad_nd(input10, [1, 1, 1, 1], 0.)
input20 = torch.max_pool2d(input10pad, [3, 3], [2, 2])
Now when you run convert to ONNX, you'll get a lot of weird stuff that looks like this:
Before -> After onnxsim
and onnx2keras will ask for op_constantofshape. If you run python -m onnxsim
that whole branch will just disappear.
Next up, we get an exception while working out what op we have:
x = getattr(self, 'op_' + op_type.lower())(*inputs, **attrs) TypeError: op_pad() got multiple values for argument 'mode'
To get around this, I fixed op_pad as in PR #34
Due to #34 being merged, this PR is now valid.
The issue still occurs and this PR fixes it but it doesn't look good. If someone can explain why it would be nice.
What exactly doesn't look good? The extra nodes in the onnx model?
The number -127.5 doesn't look good. I can't explain why it works.
Edit: Specifically, 0 doesn't work even though it's a ZeroPad. Negative 1 works but I found the best accuracy with -127.5. I assume torch, onnx and Tensorflow differ in how they implement padding but haven't taken an in-depth look at the code. You can see this result with the test I have added.
Any update on this? I'm using this magic number (-127.5) in production and am happy with the end result.
I think this will work for now. But I think I found the reason why -127.5 seems to work better than 0. The inaccuracy occurs when the padding value is higher than the other values in the input tensor for a given kernel window. The maxpool op will then select the padding value rather than the other values in the input tensor.
Thanks. That makes sense.
Fix accuracy issues
Fixes #31