Errors while training model

jyha1717 commented 3 years ago

Hello there, I came across your code while searching for MobilenetV3 SSDLite in Keras so I downloaded it and have been making use of it. I'm having some issues with the train.py file - when the program calls the fit_generator step at the end, it throws out an error "Invalid argument: Incompatible shapes: [16,2434,87] vs. [16,2434,91]" when calculating the ssd_loss. I've tried modifying some settings like n_classes to no avail - do you have any advice? Thank you!

Lucky2593 commented 3 years ago

@jyha1717 hello, did you find the solution for this problem?

jyha1717 commented 3 years ago

@Lucky2593 Not 100%, but I've made a few observations:

The ssd_input_encoder that encodes the labels to obtain y_true has final dimension 1+n_classes+12, while the model in training mode outputs a tensor (i.e. y_pred) with final dimension 1+n_classes+8, and it's probably this discrepancy that leads to the error of 4 that you see above.

The 12 additional terms are anchor box offsets (4), anchor box coordinates (4) and variances (4), and the y_pred misses out on the variances. The loss function seems to only use the anchor box offsets and not the other 8, so I tried to pad out y_pred with 4 zeros in the last dimension; however upon "training" the boxes are all predicting the wrong outcome so there's probably more at play here.

I've noticed two other things:

In train.py, the data generator creates image tensors that are in range [0,255] and not [-1,1] as used in MoibleNet, however in the inference code, the tensor values are preprocessed to the range [-1,1]
The loss function additionally seems to miss out a softmax step (see pg5 in the SSD paper), hence the minimal loss is by setting every probability to 1.

I have tried to experiment with these two additionally but I haven't found any good results - but maybe this would help you discover something. Let me know if you find anything!

cash-lo commented 3 years ago

@jyha1717 @Lucky2593 I meet this error, too. But I still have no idea how to solve it. Do you find the solution for this?

jyha1717 commented 3 years ago

@cash-lo nothing more than my last comment. All the best.

Rsndmmm commented 3 years ago

@jyha1717 Have you solve the problem yet?I think the problem might be the image_size or something else

jyha1717 commented 3 years ago

@Rsndmmm no sorry, my team and I tried very much to resolve the issues but we were unable to, so we've moved on to other models.

XiaoyuHuang96 / MobilenetV3SSDLite-tfkeras

Errors while training model #2