experiencor / keras-yolo2

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).
MIT License
1.73k stars 784 forks source link

Difficulty in training multiple classes.... #306

Closed Dhagash4 closed 6 years ago

Dhagash4 commented 6 years ago

ValueError: Cannot feed value of shape (30,) for Tensor u'Placeholder_41:0', which has shape '(35,)'

Getting this error when I try to train two classes on tiny_yolo_raccoon.h5 pretrained weight when I train for one class its not a problem but when I do it for two this is the error I am getting. I want it for traffic sign detecting and classification with more than 10 classes can I do it also if anyone have weights for multi-class training please share it it would be much helpful.

Can anyone help. Thanks in advance.

Dhagash4 commented 6 years ago

Cannot feed value of shape (30, 1024, 1, 1) for Tensor u'Placeholder_40:0', which has shape '(1, 1, 1024, 35)' this error is coming is there something wrong with the code.

rodrigo2019 commented 6 years ago

you are trying to load a model with less classes than expected, the model is expecting 30 2 classes, and you are trying to load a model with 25 1 class

Dhagash4 commented 6 years ago

@rodrigo2019 I am trying to train it for two classes only. With one class I didnt had any problem but two classes its giving error like this. See the config file I am using is there any error. { "model" : { "backend": "Tiny Yolo", "input_size": 416, "anchors": [0.57273, 0.677385, 1.87446, 2.06253, 3.33843, 5.47434, 7.88282, 3.52778, 9.77052, 9.16828], "max_box_per_image": 10,
"labels": ["stop","speedLimit"] },

"train": {
    "train_image_folder":   "train_image\",
    "train_annot_folder":   "train_annot_folder\",     

    "train_times":          8,
    "pretrained_weights":   "tiny_yolo_raccoon.h5",
    "batch_size":           1,
    "learning_rate":        1e-4,
    "nb_epochs":            1,
    "warmup_epochs":        3,

    "object_scale":         5.0 ,
    "no_object_scale":      1.0,
    "coord_scale":          1.0,
    "class_scale":          1.0,

    "saved_weights_name":   "tiny_yolo_traffic.h5",
    "debug":                true
},

"valid": {
    "valid_image_folder":   "",
    "valid_annot_folder":   "",

    "valid_times":          1
}

}

Are you telling that we can train one class or it has to be 30 we cannot train the number between them?

Please help me I have been stuck here for like 15 days. Thanks in advance

rodrigo2019 commented 6 years ago

Sorry, I forgot to divide the number by 5. But where these number came? 5 is the number of anchors, 30 is the size of your array with coordenates and object classification 30/5=6 it means {x + y + w + h + objecteness classification + 1class} for each anchor 35/5=7 it means {x + y + w + h + objecteness classification + 2class} for each anchor.

You can't use tiny_yolo_raccoon.h5 as your pre trained model, this model is a whole model including the last conv layer that is the detection layer which was trained just for 1 class, you must start a new model from scratch and just load the weights for the backend, the backend weights are here, to load these weights just leave these files in the root directory of this repo, the code will load it automatically, and leave "pretrained_weights" empty

Dhagash4 commented 6 years ago

@rodrigo2019 This is working but once I train it will generate the weights so can I use that as pre-trained weight to build it to further classes or should I keep it blank I have to do like 20+ classes and what should be the epoch for max detection. Also how to set thershold while predicting

rodrigo2019 commented 6 years ago
  1. Yes, you can use it for further trainings, but the further trainings must have same amount of classes.
  2. You can set a high numbers of epoch, your training will stop when the loss stop improving.
  3. I got good results using the default threshold
Dhagash4 commented 6 years ago

@rodrigo2019 Epoch 00005: val_loss did not improve from 0.50837 Epoch 00005: early stopping /content/drive/FLUX/keras-yolo2/utils.py:198: RuntimeWarning: overflow encountered in exp return 1. / (1. + np.exp(-x)) stop 0.4486 speedLimit 0.0000 mAP: 0.2243

I got this result I am doing it with tiny Yolo backend but it seems second class is not getting trained at all. How many images should be there I have 900 images of stop and 400 images of speedLimit are this enough?And I am not using valid images so do I have to do that....Please help.

rodrigo2019 commented 6 years ago
  1. try a learning_rate = 1e-5
  2. try to use my fork and disable early_stop.
Dhagash4 commented 6 years ago

@rodrigo2019 I am using your repo but I am having this error also its working quite slow for me compared to the default one. Epoch 00001: val_loss improved from inf to 10.33280, saving model to traffic_all.h5 libpng error: Read Error

rodrigo2019 commented 6 years ago

@Dhagash4 yes, it is slower because it does mAP validation for every epoch. but why libpng errors occurs when you are saving the model, I don't know. I never got this error

Dhagash4 commented 6 years ago

@rodrigo2019 It's only happening today I think because of slower net I am doing it in colab.

rodrigo2019 commented 6 years ago

Okay, I got it, did you found the why that error occurs?

Dhagash4 commented 6 years ago

No I didn't got why the error occurs but after epoch 1 it gets stuck and then this error is there I will try again and update you....

On Wed, Jun 20, 2018, 4:53 PM Rodrigo Meira de Andrade < notifications@github.com> wrote:

Okay, I got it, did you found the why that error occurs?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/experiencor/keras-yolo2/issues/306#issuecomment-398715581, or mute the thread https://github.com/notifications/unsubscribe-auth/AZHk8fNlewINNeuL1KvzdeYupr2dQaAPks5t-jCmgaJpZM4Ulq9v .

Dhagash4 commented 6 years ago

@rodrigo2019 The above problem was solved while using python 2 but the second class is not training I kept like 50 epoch but there was weight created as bestMap with the same thing as above the second class map was 0.00 so what to do i am not providing valid images and also the second class images are only 405 so what should I do next?

rodrigo2019 commented 6 years ago
  1. I always used python3 and it worked fine, I think you should review your dependecies in python3.
  2. If you do not provide valid images, the code will get part of your data and use as validation data.
  3. Are you training from scratch? I mean, without any kind of pre trained weights?
  4. Are you using a custom backend?
  5. the best loss value doesnt means that your model has the best mAP, so I decided to save the weights when the training got a better mAP evaluation. This is why you got two weights files in your training.

look the trainings: image I started to get some mAP values after 2 hours of training and in the other I got value after 8 hours, it is not usual, but can happens. These trainings was started from scratch. I think you have a lot of variables that can cause it, like the size of your samples, the size of your input image, the design of your network, and so on. If you share your data, or part of your data I will take a look, but please it will be easier to me if your share your annotations in CSV format, as described in readme.md

Dhagash4 commented 6 years ago

@rodrigo2019

This is link to my training data. Please share the results and guide me how to proceed https://drive.google.com/open?id=1iRAZkey7qVPkQEdJfAODPHeteFZ3DoSL

rodrigo2019 commented 6 years ago

@Dhagash4 you gave me just the image rois, I can't start a ttraining with just rois, I need the full image samples. But I saw your csv files, it looks you have big image sizes and your samples are smalls.

  1. what is your mAP for stop sign?
Dhagash4 commented 6 years ago

I don't know the map for stop sign also I trained the speedLimit today it's showing good results with single class and pre-trained weights but don't know why the results are not coming from the scratch. Basically I am using the dataset provide from VIVA site I will share that but I there the CSV file would be of all the classes they have the python program to crop just stop sign out of it. So with that I did the thing and don't know about the image size. Should I share you the original dataset or else I have these images and there annotation in XML.

And thanks for taking so much pain for me.

On Sun, Jun 24, 2018, 1:27 AM Rodrigo Meira de Andrade < notifications@github.com> wrote:

@Dhagash4 https://github.com/Dhagash4 you gave me just the image rois, I can't start a ttraining with just rois, I need the full image samples. But I saw your csv files, it looks you have big image sizes and your samples are smalls.

  1. what is your mAP for stop sign?

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/experiencor/keras-yolo2/issues/306#issuecomment-399704092, or mute the thread https://github.com/notifications/unsubscribe-auth/AZHk8VN0tgt8rWk-eJ3v6fcbvPBaI-EQks5t_p3AgaJpZM4Ulq9v .

rodrigo2019 commented 6 years ago

please, provide the link for me, I will take a look. Lets figure out what is going on :D

Dhagash4 commented 6 years ago

@rodrigo2019 http://cvrr.ucsd.edu/vivachallenge/index.php/signs/sign-detection/

This is the link I have used LISA TS Extension one (1.5GB).

On Sun, Jun 24, 2018, 1:50 AM Rodrigo Meira de Andrade < notifications@github.com> wrote:

please, provide the link for me, I will take a look. Lets figure out what is going on :D

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/experiencor/keras-yolo2/issues/306#issuecomment-399707002, or mute the thread https://github.com/notifications/unsubscribe-auth/AZHk8Y8Lpz17O6NqCmB8tCRgK5aXFMXRks5t_qMqgaJpZM4Ulq9v .

Dhagash4 commented 6 years ago

@rodrigo2019 I was training it for single class with pre-trained weights and the results were good it is detecting for two single classes but I dont know whats the problem with two classes combined do you have pretrained weights for like more classes like 20 or so then it will do a nice job. By the way have you tried it on my dataset.

Dhagash4 commented 6 years ago

@rodrigo2019 I have solved that part but now I want to like read the video and show the result simultaneously I have made following changes in the predict,py code

predict.txt

But I am facing the error like

Traceback (most recent call last): File "D:\1.keras-yolo2\predict.py", line 101, in main() File "D:\1.keras-yolo2\predict.py", line 56, in main yolo.load_weights(weights_path) File "D:\1.keras-yolo2\frontend.py", line 243, in load_weights self.model.load_weights(weight_path) File "C:\Users\Vineeth\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\engine\network.py", line 1180, in load_weights f, self.layers, reshape=reshape) File "C:\Users\Vineeth\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\engine\saving.py", line 929, in load_weights_from_hdf5_group K.batch_set_value(weight_value_tuples) File "C:\Users\Vineeth\AppData\Local\Programs\Python\Python35\lib\site-packages\keras\backend\tensorflow_backend.py", line 2435, in batch_set_value get_session().run(assign_ops, feed_dict=feed_dict) File "C:\Users\Vineeth\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 900, in run run_metadata_ptr) File "C:\Users\Vineeth\AppData\Local\Programs\Python\Python35\lib\site-packages\tensorflow\python\client\session.py", line 1111, in _run str(subfeed_t.get_shape()))) ValueError: Cannot feed value of shape (64,) for Tensor 'Placeholder_17:0', which has shape '(128,)'

Can you help?

rodrigo2019 commented 6 years ago

which repository are you using?

tanakaSyn commented 5 years ago

@rodrigo2019 I am sorry to inttrupt but I want to ask you about your repository. I think there are some big change like ・anchor size https://github.com/rodrigo2019/keras-yolo2/commit/e22ad67e4af1a8a8ea68103093743486f36df012#diff-286f950f70d14a44704a9822364d80f4 ・squared box_wh in loss function https://github.com/rodrigo2019/keras-yolo2/commit/69c05236ed23d5dc98fa1270b1e52109dd2d00a7 https://github.com/rodrigo2019/keras-yolo2/commit/dc8cd2981ee304c85f0f7ffb44b2b2221cea1120 ・ fixNMS https://github.com/rodrigo2019/keras-yolo2/commit/f3508aa953858c7798dc82a981e41673538e027a

could you give me a breaf explanation about these changes? Thanks for a good repo

rodrigo2019 commented 5 years ago

@tanakaSyn

・anchor size rodrigo2019@e22ad67#diff-286f950f70d14a44704a9822364d80f4

I changed because the anchors are based in the proportion between input and output, using custom backends you can generate differents proportions than just 32

・squared box_wh in loss function rodrigo2019@69c0523 rodrigo2019@dc8cd29

some users reoprted that these changes the mAP improved, I tested it by myself and comproved that is true, but in my repo there is a new loss function in a separete branch, this new formula makes the yolo improve the predictions

・ fixNMS rodrigo2019@f3508aa

in the older way, sometime was predited 2 boundbox insede each other, I fixed this bug

tanakaSyn commented 5 years ago

@rodrigo2019 Thank you very much

I want to ask you a bit more

what does this "loss_class + 10," 10 means? https://github.com/rodrigo2019/keras-yolo2/blob/69c05236ed23d5dc98fa1270b1e52109dd2d00a7/frontend.py#L218

and you seem to exclude "(nb_coord_box + 1e-6)" in your new loss function https://github.com/rodrigo2019/keras-yolo2/blob/69c05236ed23d5dc98fa1270b1e52109dd2d00a7/frontend.py#L211 Does it get better result than regulalizing by number of object?

dr-askar commented 5 years ago

@Dhagash4 i am trying to train on 2 and 4 classes with no success always Map 0 and early stop even i try to put early stop to 15 times and even i have a very good result on single object training how can you solve this problem