label the dataset - Githubissues

B1D1ng commented 4 years ago

I am a beginner and am currently using yolov4-tiny for training and want to recognize flames and smoke. If I train 2000 pictures that individually mark the flame, the mAP can reach 75%, but if I train the flame and smoke at the same time, the mAP is less than 50%. The total data set of flames and smoke has 4000 pictures. I think the number of images in the data set is already a lot. Maybe there is a problem with the way I label the images? There are two labeling methods below, which one is appropriate?

A: B:

Because the range of smoke is relatively large, can it coincide with the flame label when labeling like A? Or is it better to avoid overlapping labels like in B?

Looking forward to your reply, thank you !

stephanecharette commented 4 years ago

TLDR; image [A] is correct. Fix [B] and [C] to be marked up like image [A] and your network will function much better.

These are awesome. Here are some thoughts:

In image [A] I like how you marked up both the fire and the smoke.
In image [B] the fire is marked up correctly but I would have done 2 things different with the smoke:
- I would have deleted the smaller smoke region.
- I would have extended the larger smoke region to cover the same area covered by the deleted smoke region, even if it overlaps the fire.
In image [C] I think you are trying too hard to simply get 100% coverage, which isn't how training works. I can imagine this would simply confuse the neural network.

That last image needs to be explained. When the neural network runs, it needs to detect (for example) where there are flames. In the way you marked it up, there are 6 distinct areas with flames. But the truth is, there is 1 big flame that covers a big part of the image. The neural network should never see that image and think "I need to figure out how to break this out into 6 different fires." That is what you need to mark up, a single flame in picture [C].

Same with the smoke. In that last image, you have 6 smoke areas. I think that is wrong. But with the smoke, it gets subtle and may be difficult. There are a few ways to think about it -- lets go back to image [A].

You want to detect smoke. But do you want to detect all the smoke? Or the difference where the smoke and clear sky meet? I suspect it will be marginally easier to find the edges of the smoke than smoke itself. For example, in image [C], if you look at the branches in the top right of the image...is there smoke? Is it a cloudy day? The top-left definitely has smoke. If I was doing this project and these are the only 3 images I had to determine how I would mark the dataset, I would mark up the "black" smoke of image [C] and not mark up the air around the branches in the top-right corner.

Something important I had written above but cut-and-pasted here instead: [...] hopefully not get confused by cute puffy white clouds or angry storm clouds in the sky. So I hope you also have some images without fires and smoke to train with. This will be critically important, so it doesn't accidentally train on "trees" and start thinking that some trees represent "fire". Here is another example: a beautiful orange sunset over the mountain trees. If you don't include images without fire, you'll end up training on things you don't expect.

stephanecharette commented 4 years ago

In case it is helpful, I wrote this a few months ago to help people understand how to mark up their images: https://www.ccoderun.ca/darkmark/ImageMarkup.html

B1D1ng commented 4 years ago

TLDR; image [A] is correct. Fix [B] and [C] to be marked up like image [A] and your network will function much better.

These are awesome. Here are some thoughts:

In image [A] I like how you marked up both the fire and the smoke.

In image [B] the fire is marked up correctly but I would have done 2 things different with the smoke:

I would have deleted the smaller smoke region.

I would have extended the larger smoke region to cover the same area covered by the deleted smoke region, even if it overlaps the fire.

In image [C] I think you are trying too hard to simply get 100% coverage, which isn't how training works. I can imagine this would simply confuse the neural network.

That last image needs to be explained. When the neural network runs, it needs to detect (for example) where there are flames. In the way you marked it up, there are 6 distinct areas with flames. But the truth is, there is 1 big flame that covers a big part of the image. The neural network should never see that image and think "I need to figure out how to break this out into 6 different fires." That is what you need to mark up, a single flame in picture [C].

Same with the smoke. In that last image, you have 6 smoke areas. I think that is wrong. But with the smoke, it gets subtle and may be difficult. There are a few ways to think about it -- lets go back to image [A].

You want to detect smoke. But do you want to detect all the smoke? Or the difference where the smoke and clear sky meet? I suspect it will be marginally easier to find the edges of the smoke than smoke itself. For example, in image [C], if you look at the branches in the top right of the image...is there smoke? Is it a cloudy day? The top-left definitely has smoke. If I was doing this project and these are the only 3 images I had to determine how I would mark the dataset, I would mark up the "black" smoke of image [C] and not mark up the air around the branches in the top-right corner.

Something important I had written above but cut-and-pasted here instead: [...] hopefully not get confused by cute puffy white clouds or angry storm clouds in the sky. So I hope you also have some images without fires and smoke to train with. This will be critically important, so it doesn't accidentally train on "trees" and start thinking that some trees represent "fire". Here is another example: a beautiful orange sunset over the mountain trees. If you don't include images without fire, you'll end up training on things you don't expect.

Thank you for answering so carefully! I originally thought that the overlapping labeling of image A would not be able to distinguish whether the overlapping part was fire or smoke when the network was trained, so I chose the labeling methods B and C. But after reading your suggestion, I will re-annotate all the data set images according to the method A. In addition, I have collected some pictures as negative samples for training, such as orange objects, sunrise, sunset, etc., and I am going to collect negative samples with the same number of positive samples.

Thanks again for your answers!

B1D1ng commented 4 years ago

In case it is helpful, I wrote this a few months ago to help people understand how to mark up their images: https://www.ccoderun.ca/darkmark/ImageMarkup.html

I'm sorry to disturb you again. After accepting your suggestion yesterday, I changed all the data set to the annotation form of image A. There are 3711 images in total. After training yolov4-tiny today, the accuracy rate is not very satisfactory. The mAP is only about 50%, I adjusted the resolution to 608*608 before training, and set random to 1, and the total number of steps was 4500. The curve of loss and mAP is shown in the figure below:

This time I did not add negative samples such as sunrise and sunset, because yesterday I tried training 2k positive samples + 2k negative samples, the accuracy rate is less than 40%, the effect is very poor.

What should I do to get better training results ?

Looking forward to your reply , thank you !

stephanecharette commented 4 years ago

Note you wont get extremely high mAP values for your images. They are not repetitive nearly-identical images of bolts, lights, or text. Since you are dealing with flames and smoke from forest fires, I'm guessing your images are quite diverse, and especially when it comes to the smoke, it isn't always obvious exactly where the smoke starts and stops.

The values I see in that chart you included actually seem quite nice. You are at 50% mAP and loss is 1.2. I'd be willing to bet the neural network you trained with these results must be pretty good at identifying both fire and smoke in similar forest fire images.

Next, you need to add a bunch of forest images, sunsets, tree-covered mountains, clouds-in-the-sky, etc. The labelling for those images should indicate there is nothing to find. E.g., the .txt files will be completely empty.

stephanecharette commented 4 years ago

The other thing I forgot to mention is I would increase max_batches. Forget the "2000 * number_of_classes" rule, that is just a minimum value to get people started. If you have lots of images, and your training to recognize difficult objects, I'd try with at least max_batches=20000. Take a look here to see what increasing max_batches can do for a non-trivial 2-class neural network: https://www.ccoderun.ca/programming/2020-01-04_neural_network_training/

B1D1ng commented 4 years ago

The other thing I forgot to mention is I would increase max_batches. Forget the "2000 * number_of_classes" rule, that is just a minimum value to get people started. If you have lots of images, and your training to recognize difficult objects, I'd try with at least max_batches=20000. Take a look here to see what increasing max_batches can do for a non-trivial 2-class neural network: https://www.ccoderun.ca/programming/2020-01-04_neural_network_training/

Thank you very much for answering so carefully. I will immediately use your suggestions for tuning. If there are good improvements afterwards, I will give you feedback. Thank you!

B1D1ng commented 4 years ago

The other thing I forgot to mention is I would increase max_batches. Forget the "2000 * number_of_classes" rule, that is just a minimum value to get people started. If you have lots of images, and your training to recognize difficult objects, I'd try with at least max_batches=20000. Take a look here to see what increasing max_batches can do for a non-trivial 2-class neural network: https://www.ccoderun.ca/programming/2020-01-04_neural_network_training/

After a day of training, I got a very good model. My dataset has 7800 images, of which 7200 are positive samples and 600 are negative samples (mainly sunrise and sunset). I set max_batches=40000. The training is not over yet, only 27,000 steps have been trained and good results have been achieved. The AP of the flame is 86%, the AP of the smoke is 71%, and the mAP is 79%. I tested it and found that the flame recognition effect is very good, but the smoke recognition effect is not particularly ideal. Here are some test results:

I have a question for you. If I want to continue training, can I add more images to the data set? Or does the dataset cannot be changed during each complete training?

Looking forward to your reply, thank you !

stephanecharette commented 4 years ago

Those look like promising results. Yes, you can continue to add images if you want, then use the clear flag to continue training with the same weights file.

For example, if the project is called "fire.cfg" and "fire_best.weights". I then mark up a bunch of new images. If I want to restart training without restarting from zero, but instead want to re-use the existing weights file, I might use something like this:

darknet detector -map -dont_show train fire.data fire.cfg fire_best.weights -clear 1

B1D1ng commented 4 years ago

Those look like promising results. Yes, you can continue to add images if you want, then use the clear flag to continue training with the same weights file.

For example, if the project is called "fire.cfg" and "fire_best.weights". I then mark up a bunch of new images. If I want to restart training without restarting from zero, but instead want to re-use the existing weights file, I might use something like this:
darknet detector -map -dont_show train fire.data fire.cfg fire_best.weights -clear 1

Thanks for reply! Will the newly added images be added to the original dataset to continue training or only the newly added images need to be trained?

stephanecharette commented 4 years ago

Will the newly added images be added to the original dataset to continue training or only the newly added images need to be trained?

I'm actually not sure. Personally, I've always added the new images to the existing ones, and trained with the full set of old+new images. I don't know what happens if you train with only the new ones.

B1D1ng commented 4 years ago

Will the newly added images be added to the original dataset to continue training or only the newly added images need to be trained?

I'm actually not sure. Personally, I've always added the new images to the existing ones, and trained with the full set of old+new images. I don't know what happens if you train with only the new ones.

OK! I will try it. Thank you very much !

B1D1ng commented 4 years ago

Will the newly added images be added to the original dataset to continue training or only the newly added images need to be trained?

I'm actually not sure. Personally, I've always added the new images to the existing ones, and trained with the full set of old+new images. I don't know what happens if you train with only the new ones.

Hi, after adopting your suggestion, I set the number of iterations to 20000, 30000, and 40000, and finally the mAP reached about 85%. However, I found that the model may have been overfitted after so many iterations. The test results of the images included in the dataset used to train are very good, but when a new image which is out of dataset is detected, it may even be undetectable, even if the target is very obvious. In order to avoid overfitting, I stopped training when the avg loss would not decrease anymore(around 0.5). At this time, the number of iterations is about 9000, and the mAP at this time is about 68%. Is there any way to avoid overfit and improve mAP?

HHHcwb commented 4 years ago

@B1D1ng 你好，我也在做烟雾检测的研究，我想请问一下，为什么我在训练的时候会得到这样的信息： v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 139 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000001, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.000007, iou_loss = 0.000000, total_loss = 0.000007 v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 150 Avg (IOU: 0.000000, GIOU: 0.000000), Class: 0.000000, Obj: 0.000000, No Obj: 0.000099, .5R: 0.000000, .75R: 0.000000, count: 1, class_loss = 0.042966, iou_loss = 0.000000, total_loss = 0.042966 v3 (iou loss, Normalizer: (iou: 0.07, cls: 1.00) Region 161 Avg (IOU: 0.648353, GIOU: 0.603125), Class: 0.999898, Obj: 0.822277, No Obj: 0.002713, .5R: 1.000000, .75R: 0.000000, count: 3, class_loss = 0.128908, iou_loss = 0.374276, total_loss = 0.503185 total_bbox = 38096, rewritten_bbox = 0.000000 % Region 139和150都是0或者很小的数字，第一次计算得到的mAP也是0，是因为我标注的时候有问题吗，我标注烟雾的时候有用一个大框将烟雾和背景都框起来了，也有用几个小框将烟雾给框出来。还想问一下如何才能看到loss的曲线图呀

nanhui69 commented 3 years ago

@B1D1ng could you share you annotationed data to me ? I have the same detection scene ?thanks a lot！

AlexeyAB / darknet

label the dataset #6437