Closed tonystratum closed 2 months ago
Hi, thanks! Strictly speaking, TFLite
only supports static shapes. I never tried Coral, but I think it should be fine.
@AlexanderLutsenko when I inspected the converted model with the https://github.com/google-ai-edge/model-explorer , it indicated shapes of size -1, which I presume are dynamic. Am I doing something wrong?
Was it input tensor that had size -1? That should never occur unless you annotate some dimensions as dynamic in input_shapes
argument. As for intermediate/output tensors, the result of certain ops has to be dynamic. For example:
class FilterBoxes(nn.Module):
def __init__(self):
super().__init__()
self.threshold = 0.1
def forward(self, boxes, scores):
mask = scores > self.threshold
return boxes[mask], scores[mask]
boxes = torch.normal(0, 1, size=(128, 4))
scores = torch.normal(0, 1, size=(128,))
pytorch_module = FilterBoxes().eval()
keras_model = nobuco.pytorch_to_keras(
pytorch_module,
args=[boxes, scores],
)
"1" in tensor shapes should be "-1", Netron visualizes it wrong
Obviously, we do not know in advance how many scores will be greater that threshold. To avoid dynamic shapes, gt
is usually replaced with topk
:
def forward(self, boxes, scores):
top_scores, top_indices = torch.topk(scores, k=10, dim=-1)
top_boxes = boxes[top_indices]
return top_boxes, top_scores
That's about how much I can tell without looking at your model. If that's not enough, feel free to share the part of it that causes problem.
Hi! I have a similar problem but it may just be a limitation of tensorflow or tflite.
I have a 2-stage CNN in pytorch where the first stage uses a scaled down image to predict the center of a crop. The second stage then takes the crop in full resolution as input to make a prediction. Both the images and crops are of fixed size (640x480 and 128x128)
I convert the model from pytorch to tflite and want to run it on an Android phone GPU. Everything works fine on my computer, I get identical output from pytorch and tflite. To make it run on Android GPU I had to make some changes to the cropping, and it now looks a bit hacky but the tf.lite.experimental.Analyzer.analyze()
function says it is GPU comparable. However, when I try to run it on Android GPU it crashes and says that the size of the crop is dynamic and it only supports static-sized tensors.
I can see why it thinks the size is dynamic but it really isn't, can I convince it that is is static?
@johan-sightic Well, that's unfortunate. But I see why TFLite fails to recognize the crop shape as static. Consider this:
y0 = center_y - h
y1 = center_y + h
crop = image[:, y0: y1]
As center_y
and center_y
are tensors, so are y0
and y1
. When you invoke image[:, y0: y1]
, the indexing operator ([]
) has no way of knowing that y0
and y1
are related and y1 - y0 == const
. It wouldn't be a problem if the op accepted (center_y, h)
as cropping parameters instead of (y0, y1)
. If you are to assemble such an op from existing Tensorflow primitives, brace yourself for some real jank:
@nobuco.traceable
def get_crop(x, center_y, center_x, h, w):
return x[:, :, center_y - h:center_y + h, center_x - w:center_x + w]
@nobuco.converter(get_crop, channel_ordering_strategy=nobuco.ChannelOrderingStrategy.FORCE_TENSORFLOW_ORDER)
def converter_get_crop(x, center_y, center_x, h, w):
def func(x, center_y, center_x, h, w):
x = tf.roll(input=x, shift=-(center_y - h), axis=1)
x = tf.roll(input=x, shift=-(center_x - w), axis=2)
return x[:, :h*2, :w*2]
return func
class MyModel(nn.Module):
def __init__(self):
super().__init__()
self.stage_1_conv = nn.Conv2d(3, 4, kernel_size=8, stride=8)
self.stage_1_relu = nn.ReLU()
self.stage_1_linear = nn.Linear(4 * 20 * 15, 2)
self.stage_1_sigmoid = nn.Sigmoid()
self.stage_2_conv = nn.Conv2d(3, 4, kernel_size=8, stride=8)
self.stage_2_relu = nn.ReLU()
self.stage_2_linear = nn.Linear(4 * 16 * 16, 5)
self.stage_2_tanh = nn.Tanh()
@torch.no_grad()
def forward(self, image: torch.Tensor) -> torch.Tensor:
# Cast and normalize image (not GPU compatible, but that is fine)
image = image.float() / 255
##### Predict crop center (stage 1) #####
x = self.stage_1_conv(image[:, :, ::4, ::4])
x = self.stage_1_relu(x.flatten(start_dim=1))
x = self.stage_1_linear(x)
rel_crop_center = self.stage_1_sigmoid(x)
##### Extract crop #####
image_size = torch.tensor((640, 480), dtype=torch.int32)
half_crop_size = torch.tensor((64, 64), dtype=torch.int32)
crop_center = (rel_crop_center * image_size).to(torch.int32)
# Crop has to be within frame
crop_center = torch.clamp(crop_center, half_crop_size, image_size - half_crop_size)
center_y, center_x = crop_center[0]
crop = get_crop(image, center_y, center_x, 64, 64)
x = self.stage_2_conv(crop)
x = self.stage_2_relu(x.flatten(start_dim=1))
out = self.stage_2_tanh(self.stage_2_linear(x))
return out
One drawback of this approach, apart from being suboptimal, is that tf.roll
is not included in TFLITE_BUILTINS
. You'd need to download/compile TFLite binaries with extended operator set. Also, don't forget to enable Flex ops in TFLiteConverter
:
converter = tf.lite.TFLiteConverter.from_keras_model(keras_model)
converter.target_spec.supported_ops = [tf.lite.OpsSet.TFLITE_BUILTINS, tf.lite.OpsSet.SELECT_TF_OPS]
tflite_model = converter.convert()
Tested on an Android device with OpenCL and XNNPACK delegates, so it might work for you, too.
I see, thank you very muck for such a good answer, I will do some tests to see if this is a valid approach for us!
@johan-sightic Hey, I just found a much better solution:
@nobuco.traceable
def get_crop(x, center_y, center_x, h, w):
return x[:, :, center_y - h:center_y + h, center_x - w:center_x + w]
@nobuco.converter(get_crop, channel_ordering_strategy=nobuco.ChannelOrderingStrategy.FORCE_TENSORFLOW_ORDER)
def converter_get_crop(x, center_y, center_x, h, w):
def func(x, center_y, center_x, h, w):
return tf.image.crop_to_bounding_box(x, center_y - h, center_x - w, h*2, w*2)
return func
It translates to normal tf.slice
, so no custom binaries required.
@johan-sightic Hey, I just found a much better solution:
@nobuco.traceable def get_crop(x, center_y, center_x, h, w): return x[:, :, center_y - h:center_y + h, center_x - w:center_x + w] @nobuco.converter(get_crop, channel_ordering_strategy=nobuco.ChannelOrderingStrategy.FORCE_TENSORFLOW_ORDER) def converter_get_crop(x, center_y, center_x, h, w): def func(x, center_y, center_x, h, w): return tf.image.crop_to_bounding_box(x, center_y - h, center_x - w, h*2, w*2) return func
It translates to normal
tf.slice
, so no custom binaries required.
Thank you very much! This seems to work! I'm not sure how much of my model is actually run on cpu vs gpu because I don't quite understand the log output but at least it runs about 75% faster now :smile:
@tonystratum I'm closing the issue for now due to inactivity. Please reopen if it's still relevant.
Trying to convert a model to tflite for the Coral TPU, which only supports static shapes. Is there any way to force static shapes throughout the whole graph?
P.S. Magnificent work! Thanks!