google-coral / edgetpu

Coral issue tracker (and legacy Edge TPU API source)
https://coral.ai
Apache License 2.0
423 stars 124 forks source link

Is the size of the network limited? Internal Compiler Error. #60

Closed WalidVision closed 4 years ago

WalidVision commented 4 years ago

Hello everyone,

lets assume we want to compile Standard-Unet (see below) with EdgeTPU-Compiler. The Standard Unet starts in the first block with 64 Filters. If i choose the size of the network like Standard-Unet (nr_filter = 64), i get: Internal Compiler Error. Aborting.

When im using smaller Filter-Size for Unet (e.g. 32), it can be compiled and used on EdgeTPU succesfully. Is there a limit how large a network can be? If so, can we get info about how large?

Im using:

def conv_block(x, nr_filter):
    x = Conv2D(nr_filter, 3, padding='same')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    x = Conv2D(nr_filter, 3, padding='same')(x)
    x = BatchNormalization()(x)
    x = Activation('relu')(x)
    return x

def up_conv_block(x, x_prev, nr_filter):
    x = UpSampling2D(interpolation='bilinear')(x)
    x = concatenate([x, x_prev], axis=3)
    x = conv_block(x, nr_filter)
    return x

def get_model(nr_filter=64):
    input_layer = Input(shape=(512, 512, 3))
    x1 = conv_block(input_layer, nr_filter) 
    x1_ = MaxPooling2D()(x1)
    x2 = conv_block(x1_, nr_filter*2) 
    x2_ = MaxPooling2D()(x2)
    x3 = conv_block(x2_, nr_filter*4) 
    x3_ = MaxPooling2D()(x3)
    x4 = conv_block(x3_, nr_filter*8) 
    x4_ = MaxPooling2D()(x4)
    x5 = conv_block(x4_, nr_filter*16) 

    x = up_conv_block(x5, x4, nr_filter*8)
    x = up_conv_block(x, x3, nr_filter*4)
    x = up_conv_block(x, x2, nr_filter*2)
    x = up_conv_block(x, x1, nr_filter)

    output_layer = Conv2D(1, 1, padding = 'same', activation = 'sigmoid')(x)
    model = Model(inputs = input_layer, outputs = output_layer)
    model.compile(optimizer = Adam(lr = 0.001), loss = 'binary_crossentropy', metrics = ['accuracy'])
    return model
Namburger commented 4 years ago

@WalidVision it is possible that your model is hitting a HW limit on the EdgeTPU, which we are working to expands (I can't disclose the limit right now, unfortunately). Although sometimes compiler can be a little buggy, would you be able to submit your tflite model? I can do a little digging to verify.

WalidVision commented 4 years ago

@Namburger Sure, here is the file: https://easyupload.io/425e1l

If it is hitting a HW Limit, does it mean that there is no way (only with new EdgeTPU Hardware) to use this model? Or could a Compiler-Update solve such a problem?

PS: Can you also check why this much smaller model (Unet with mobilenetV2 Backbone from segmentation-models package) also causes the same Erro: "Internal compiler Error. Aborting!" : https://easyupload.io/bzon5z

Namburger commented 4 years ago

@WalidVision not necessary, I know we're working on something that could possibly fix this even with our current TPU. I can't disclose info atm. I can confirm that the error I'm seeing right now is related to size, however I believe it should pass even with our current setup, so it must be a compiler bug. I opened an internal bug for a similar issue. I'll also submit these models for fixes.

On another note, I do see some ops from your code that are not supported by the compiler yet (although this shouldn't crash the compiler). Also, are you using keras or tf.keras? I believe the ops maybe a little different.

WalidVision commented 4 years ago

@Namburger for the bigger net (first file) im only using following Keras functions: Activaition, BatchNormalization, Conv2D, UpSampling2D, MaxPooling2D.

Im glad that you are working on this problem. Please let us know if any news or updates are available. Thanke you very much!

Namburger commented 4 years ago

@WalidVision Thanks, I'll give you updates as I get them. Compiler seems a little flaky for hitting this issue especially since I started noticing more issues like this where it would pass for bigger model... Hopefully we'll get this fix out by next release.

Namburger commented 4 years ago

@WalidVision Just to update you on this issue, we were able to detect and fixed the bug on our code base. This model will be working by the next release! We can't share details on the next release yet, but please follow our news page for updates!

WalidVision commented 4 years ago

@Namburger Thanks for the info. Those are great news. Can you please tell me if both* Models are working with your updated compiler?

*The big Standard-Unet and the smaller Unet with mobilenetV2 Backbone (see above for both file).

JiatianWu commented 4 years ago

@Namburger Since several issues reported the compiler bug and it has been bothering us for a long time, can the coral time update the compiler in some other branches in the GitHub repo? That would be very helpful.

Namburger commented 4 years ago

@JiatianWu Unfortunately the compiler is not open source currently but we are working on it :) FYI:

edgetpu_compiler -s model2.tflite 
Edge TPU Compiler version 2.1.302470888

Model compiled successfully in 12264 ms.

Input model: model2.tflite
Input size: 8.11MiB
Output model: model2_edgetpu.tflite
Output size: 9.47MiB
On-chip memory used for caching model parameters: 6.86MiB
On-chip memory remaining for caching model parameters: 7.75KiB
Off-chip memory used for streaming uncached model parameters: 1.22MiB
Number of Edge TPU subgraphs: 1
Total number of operations: 93
Operation log: model2_edgetpu.log

Model successfully compiled but not all operations are supported by the Edge TPU. A percentage of the model will instead run on the CPU, which is slower. If possible, consider updating your model to use only operations supported by the Edge TPU. For details, visit g.co/coral/model-reqs.
Number of operations that will run on Edge TPU: 83
Number of operations that will run on CPU: 10

Operator                       Count      Status

LOGISTIC                       1          More than one subgraph is not supported
ADD                            10         Mapped to Edge TPU
CONCATENATION                  1          More than one subgraph is not supported
CONCATENATION                  3          Mapped to Edge TPU
QUANTIZE                       4          Mapped to Edge TPU
QUANTIZE                       1          Operation is otherwise supported, but not mapped due to some unspecified limitation
PAD                            5          Mapped to Edge TPU
CONV_2D                        41         Mapped to Edge TPU
CONV_2D                        5          More than one subgraph is not supported
DEPTHWISE_CONV_2D              17         Mapped to Edge TPU
RESIZE_BILINEAR                3          Mapped to Edge TPU
RESIZE_BILINEAR                2          Operation is otherwise supported, but not mapped due to some unspecified limitation