Incompatible shapes of loaded weights and Model Layer #26 (named "rpn_out_class") for VGG while testing

Avani1994 commented 4 years ago

Hi, I was trying to use trained weights from mode_path, but I am getting follwing error when I am trying to load weights:

ValueError: Layer #26 (named "rpn_out_class"), weight <tf.Variable 'rpn_out_class_6/kernel:0' shape=(1, 1, 512, 1) dtype=float32> has shape (1, 1, 512, 1), but the saved weight has shape (9, 512, 1, 1).

Can you please help debug the issue? Am stuck at this place!

My parameters are as follows:

anchor_box_scales=[64, 128, 256] or [128, 256, 512] anchor_box_ratios=[[1, 1], [1./math.sqrt(2), 2./math.sqrt(2)], [2./math.sqrt(2), 1./math.sqrt(2)]], num_rois = 256 im_size = 300 num_anchors = len(anchor_box_scales) * len(anchor_box_ratios)

Avani1994 commented 4 years ago

I was able to get testing running by putting in default parameters, however it seems likes network did not learn anything and is giving random wrong results. I am trying to detect emojis in whatsapp chat image. Do you think FRCNN won't work for small object detection or should i change something in training. I reduced image size by half but forgot to reduce anchor scales. Do you think that might be the cause? And I should try retraining with original image size = 600 and default anchor scales. If not can you please suggest something else that might be good fit?

eleow commented 4 years ago

I was able to get testing running by putting in default parameters, however it seems likes network did not learn anything and is giving random wrong results. I am trying to detect emojis in whatsapp chat image. Do you think FRCNN won't work for small object detection or should i change something in training. I reduced image size by half but forgot to reduce anchor scales. Do you think that might be the cause? And I should try retraining with original image size = 600 and default anchor scales. If not can you please suggest something else that might be good fit?

Yes, each model that is trained will be specific to the parameters that were used in training. Hmm.. I would think, with emoticons in chat images, you could maybe preprocess the images based on color or edge detection algorithms and other rules to form a mask. Then if the results are good, you might not need FRCNN at all. Otherwise, the preprocessed image could then be fed to FRCNN. What are your classes like? Are you trying to detect and classify the different emoticons? Or just the presence of emoticons? How are you labelling the emoticons currently?

Avani1994 commented 4 years ago

Hey thanks, @eleow. You mean that just using Image processing rather than deep learning to detect emojis? I think that would help in detection but it would be really hard to classify emoticons. I am trying to both detect and classify emoticons. Currently I have around 88 classes, I am using unicodes of emoji's as class and have following classes: Total classes: 88 [('bluetick', 6891), ('🤔', 767), ('😭', 748), ('😠', 733), ('😋', 720), ('😚', 717), ('😃', 712), ('😇', 701), ('👍', 415), ('😒', 411), ('\U0001f92a', 409), ('😆', 408), ('😄', 404), ('👎', 404), ('😅', 396), ('😦', 393), ('😓', 393), ('😎', 392), ('\U0001f97a', 390), ('😩', 390), ('😙', 387), ('😁', 385), ('🤗', 384), ('😕', 383), ('😔', 383), ('\U0001f91f', 382), ('😉', 381), ('😢', 379), ('😮', 378), ('😖', 378), ('😴', 377), ('😌', 377), ('😣', 376), ('😑', 376), ('🤕', 374), ('😧', 373), ('👌', 373), ('🤢', 372), ('🤓', 372), ('😂', 372), ('🤞', 370), ('😨', 370), ('✌️', 370), ('😰', 369), ('😗', 369), ('\U0001f92d', 368), ('😱', 368), ('😞', 366), ('🤥', 364), ('☺️', 364), ('😐', 363), ('😟', 361), ('🙁', 360), ('😊', 360), ('🤒', 358), ('😀', 358), ('\U0001f928', 357), ('😶', 353), ('😷', 352), ('☹', 351), ('🤤', 350), ('😲', 350), ('😫', 350), ('\U0001f929', 348), ('😥', 348), ('🤧', 346), ('😝', 346), ('👊', 346), ('😡', 345), ('\U0001f973', 344), ('😛', 344), ('😏', 344), ('\U0001f92e', 343), ('🙂', 343), ('😬', 343), ('🤣', 341), ('🙄', 341), ('\U0001f975', 340), ('\U0001f970', 340), ('\U0001f92b', 339), ('😜', 337), ('🤜', 328), ('😪', 328), ('🙃', 327), ('😍', 325), ('🤑', 323), ('😘', 323), ('😯', 320)] Have not included all emojis in the classes, but in future I plan on extending this list and also include facebook / insta emoticons, but at least need a base to proceed forward.

I am generating training data by myself, currently I have 1300 training images and 200 test images. Can generate more if needed. the only special case here is that I have same size object detection (24,24) and class 'bluetick' is (24,15) but they are very small size. After, successfully training the model on this synthetic dataset, I expect Model to be accurate on real chat Images. I dont know what should be the best thing/approach I should try for this usecase. You think deep learning won't give good results?

Let me attach a sample image and its annotations:

Image0 is one of the images generated omitting the bounding box, and is used for training
Image1 is annotated with bounding box for your reference

I am not able to upload annotations here as csv format is not supported only images are supported. But you can get idea that coordinates will be the coordinates of rectangles drawn around each emoticon. For "blue tick" I am just giving class name as 'bluetick'

image0: image1:

Your suggestions will be really helpful as this problem seems open ended for me and I could not narrow down the approaches I can take.

eleow commented 4 years ago

Well, in my opinion, deep learning might not be the best approach. You see, in your training set, for each class eg ('😭', 748), all images would basically be the same right? If you could guarantee that the emoticon size will be constant, then you might as well perform some form of pixel matching/similarity vector, using boxes of 24x24 pixels,, and search the image? To be more efficient, I would get bounding boxes for the message content areas (white rectangles and green rectangles), and search within those areas only.

Alternatively, if you have to use deep learning, then just classify all emoticons as a single class. Then for each detected emoticon, classify it using pixel matching or similarity vector, etc.

Avani1994 commented 4 years ago

Hmm yeah makes sense, but emoji size might not be constant in the real chat images, It might be lil big but yeah as you see in chat images it won't be too big but won't be constant as well!

eleow / tfKerasFRCNN

Incompatible shapes of loaded weights and Model Layer #26 (named "rpn_out_class") for VGG while testing #7