Open mathieuorhan opened 5 years ago
@mathieuorhan Hi,
My idea is to modify the loss to take into account only pedestrian predictions, and use a trained network to improve his performance on this class. I don't know if it's a good idea, nor how to do it precisely (I expect some unlearning). What do you think of it ?
So do you want to continue train with only Caltech Pedestrian dataset and comment this line? https://github.com/AlexeyAB/darknet/blob/57e878b4f9512cf9995ff6b5cd6e0d7dc1da9eaf/src/yolo_layer.c#L218
I think it is a bad idea, since:
if you comment this line - it will be trained to detect Persons, but it willn't be trained not to detect backgrounds
if you don't comment this line - it will be trained to detect Persons, but it willn't be trained to detect (car, truck, pedestrian, bicycle, motor)
The best practice - to use merged datasets BDD100K + Cityscape + Caltech Pedestrian dataset where are must be labeled all required objects (car, truck, pedestrian, bicycle, motor, person).
First, thank you for your help.
Well, Caltech contains only pedestrians labels, but there are unlabeled cars/trucks/... inside the dataset, therefore I can't merge it with BDD100K and Cityscape. Caltech is huge and I can get much more higher mAP on it than other datasets for pedestrian.
After thinking a bit about it, the problem lies in the objectness score. It is the key to improve detection, and I can't see a way to compute his loss without unlearning the other classes. I don't see how to train only with a partially labeled dataset, maybe this is not possible within the YOLO framework at all.
Could maybe use a trained YOLO to detect cars/others than pedestrians in Caltech, and use that as groundtruth for training.
After thinking a bit about it, the problem lies in the objectness score. It is the key to improve detection, and I can't see a way to compute his loss without unlearning the other classes. I don't see how to train only with a partially labeled dataset, maybe this is not possible within the YOLO framework at all.
I think this is a characteristic of all neural networks, not just yolo.
Could maybe use a trained YOLO to detect cars/others than pedestrians in Caltech, and use that as groundtruth for training.
Yes. Especially for this there is Pseudo Labeling feature: https://github.com/AlexeyAB/darknet/blob/ca43bbdaaede5c9cbf82a8a0aa5e2d0a4bdcabc0/src/detector.c#L1154-L1178
For mering existing labels with new detected by Yolo labels - should change this line: https://github.com/AlexeyAB/darknet/blob/ca43bbdaaede5c9cbf82a8a0aa5e2d0a4bdcabc0/src/detector.c#L1160
to this line:
FILE* fw = fopen(labelpath, "ab");
To use Pseudo Labeling, just use such command:
./darknet detector test data/obj.data yolo_obj.cfg yolo_obj.weights -dont_show < caltech/train.txt -save_labels
At the end of each existing txt-label files (with Pedestrian labels in yolo-format) there will be added detections of (car, truck, pedestrian, bicycle, motor) in the correct format for training.
Then just train by using this caltech/train.txt
as usual or merge train.txt files of: BDD100K + Cityscape + Caltech Pedestrian.
I think this is a characteristic of all neural networks, not just yolo
Mostly, yes, but I think I've seen ways to do that kind of stuff in semantic segmentation, with datasets partially annotated. Can't find the papers though.
Yes. Especially for this there is Pseudo Labeling feature
That's a very good idea. I'll try that. Thank you. It should be better to work with YOLOv3 to generate the labels instead of Tiny.
Hi @AlexeyAB,
I'm working on an object detector for autonomous driving (car, truck, pedestrian, bicycle, motor). I have so far trained Tiny YOLO v3 on BDD100K and Cityscape. I want to improve performance for some classes, such as pedestrians. I would like to leverage the Caltech Pedestrian dataset, but it is only annotated with pedestrians. My idea is to modify the loss to take into account only pedestrian predictions, and use a trained network to improve his performance on this class. I don't know if it's a good idea, nor how to do it precisely (I expect some unlearning). What do you think of it ?