ZeroDivisionError: division by zero

tony9378 commented 6 years ago

I followed your instruction and trained the model again, but when i run it as a demo, it always shows such an error.

tony9378 commented 6 years ago

Traceback (most recent call last): File "inference.py", line 192, in generate_output(input_files, mode) File "inference.py", line 162, in generate_output image = run_inference(image_orig, model, sess, mode, sign_map) File "inference.py", line 77, in run_inference boxes = nms(y_pred_conf, y_pred_loc, prob) File "E:\Masterarbeit\SSD_Project\model.py", line 248, in nms iou = calc_iou(box[:4], other_box[:4]) File "E:\Masterarbeit\SSD_Project\data_prep.py", line 28, in calc_iou iou = intersection / union ZeroDivisionError: division by zero

sudonto commented 6 years ago

@tony9378 ,

Have you figured out to this issue? I face the same problem with you. Everytime I finished training my network, I load the new model.ckpt and this error arose.

YashBansod commented 6 years ago

I think I have figured out why the error might be happening. Can someone tell me the size of their data_prep_400x260.p. Is it 514 MB? Or is it something less?

sudonto commented 6 years ago

@YashBansod , My data_prep_400x260.p is 514MB (514,001,478 bytes to be precise)

YashBansod commented 6 years ago

The Div by Zero occurs for one of the two reasons (atleast the ones I have found):

If your data_prep_400x260.p is not made properly. (Verify that it is around 514 MB in size)
If you model was not trained for sufficient number of epochs (i.e. your model must have converged to a certain extent. Maybe value of some variable is very small ~= 0 at the time you ended your training).

To avoid the first problem, just redo the data prep process properly

To avoid the second from happening, change the learning rate to at least 0.01 (its set at 0.001 in the code) and train it for at least 20 Epochs. Your model won't converge much at this point but at least you won't get the Div by Zero Error.

@tony9378 @sudonto can you confirm if this solves your problem?

DRosemei commented 6 years ago

@sudonto @YashBansod My data_prep_400x260.p is 2.1GB, and the result after running inference.py is not good. I have followed your steps to run data_prep.py. Could you tell me why?

YashBansod commented 6 years ago

@DRosemei there is something wrong that you are doing before executing data_prep.py. Please follow the pre-processing instructions (all the steps before executing python data_prep.py ) and your data_prep_400x260.p should be around 514 MB. Also, can you try plotting the cost function and see that it was at it's least when you ended your training. For me, it started to rise after certain epochs in one experiment. Anyway, It has been a long time since I worked on this and the code is really not good if you plan to just use it for any sort of benchmarking. Try the original implementations for that. Rather my purpose was for understanding an implementation of SSD in TF and this seemed to be more readable than other. The inference may not be optimal, as the model overfits the data.

Jasonsun1993 commented 6 years ago

@YashBansod My data_raw_400x260.p is 1.4M. But there was no "error" tip. What's wrong with my code?? PS : I downloaded the full dataset, and tackled it.

DRosemei commented 6 years ago

@YashBansod Sorry to reply late. Thank you for your reply. I have done pre-processing instructions carefully for several times, but the size dosen't change. By the way, my data_raw_400*260.p is 921.7kb. Is it right? Besides, when I ran inference.py by restoring the model author gave, the result is not good. Only a few boxes are right. As for plotting the cost function, I'm sorry I just start learning DL, so I can't help you. If possible, would you mind sending me a copy of your code because I have wasted a lot of time on it. My email address is 13261197616mrh@gmail.com. Thank you again.

sudonto commented 6 years ago

Hi @DRosemei , @Jasonsun1993 The size of data_raw_400x260.p should be around 514MB. Please read this. Particularly, please pay attention the answer from YashBansod dated 26 Feb. @DRosemei , in my case, the model can correctly detect all the signs in the sample images.

DRosemei commented 6 years ago

@sudonto Thanks for your reply. I have already read #29, it saves me a lot of time. After running create_pickle.py, I got data_raw_400x260.p(921.7KB) and resized_images_400x260(2,600 items, totalling 139.0 MB). After running data_prep.py, I got data_prep_400x260.p (2.1GB). Could you please tell me where I made mistakes? Besides, my detection on sample images is like this. pedestrian_1323896918 avi_image9 stop_1323804419 avi_image31 Thank you again.

sudonto commented 6 years ago

@DRosemei , have you checked what the content of your mergedAnnotations.csv is? Can you confirm that only stop sign and pedestrian cross is in that file?

DRosemei commented 6 years ago

@sudonto Thank for your reply. My mergedAnnotations.csv is the same as allAnnotations.csv, because as you said in #29, there is a line of code that filters out the annotation tag other than the desired signs.But I find it in create_pickle.py, not in data_prep.py. Here is the code: sign_name = fields[1] if sign_name != 'stop' and sign_name != 'pedestrianCrossing': continue # ignore signs that are neither stop nor pedestrianCrossing signs Besides, could you tell me how to make a mergedAnnotations.csv that only contains stop and pedestrian signs?

sudonto commented 6 years ago

Ah, If those files are the same then this causes data_prep_400x260.p to have over 514M in size. You could create CSV file to have only the desired signs by deleting manually in Excel (sorting its column tag first then delete the rows) although the provided python file (the one that comes from the dataset) can do that. Yes, I mistype the filename :)

DRosemei commented 6 years ago

@sudonto Thanks. Could you tell me the result after running create_pickle.py? I got data_raw_400x260.p (921.7KB) and resized_images_400x260(2,600 items, totalling 139.0 MB). So I can check whether to recreate CSV file.
I also find that there are stop or pedestrain signs in other pictures like keepRight, but this may not be so important. The problem that puzzles me a lot is that I can't get good results by using the restored model the author gave as I mentioned above.

sudonto commented 6 years ago

I will re-run the project again. Will tell you the result later. So strange that the original model cannot predict the signs accurately.

DRosemei commented 6 years ago

@sudonto I have found the solution. Because I use python 2.7, so "/" is different from "/" in python 3.5. Thanks for your help and I'm waiting for you results :)

YashBansod commented 6 years ago

The dependencies of the project https://github.com/georgesung/ssd_tensorflow_traffic_sign_detection#dependencies clearly state python3.5+.

Jasonsun1993 commented 6 years ago

@sudonto did you train the whole data set or extend this code to more traffic sign classes? If you did, please tell me about the results. Thx!

DRosemei commented 6 years ago

@YashBansod Thanks. I have noticed that before, so I make some changes to the code.Now I am going to install python 3.5 to solve the problem.

DRosemei commented 6 years ago

@sudonto My data_prep_400x260.p is 514M now. Thanks for your help. :)

youthM commented 5 years ago

@DRosemei How do you make the data_prep_400x260.p being 514M? I followed #29 , but failed.

DRosemei commented 5 years ago

@youthM I don’t know where exactly you fail. First, you should make sure your environment is the same as the author’s. I guess you may have trouble in “create_pickle.py”, you may find answers in #21

georgesung / ssd_tensorflow_traffic_sign_detection

ZeroDivisionError: division by zero #23