Car-Pedestrian multi class object detection DetectNet mAP and inference

eweill commented 7 years ago

I have been working on multi-class detection for cars and pedestrians using DetectNet, as mentioned in MCOD. I can successfully train a single-class DetectNet model for cars and pedestrians (separately); however I am unable to train the 2 class network correctly. Note that when I try to train the pedestrian network on KITTI, I am achieving great classification but the mAP value does not reflect that (it hovers around 10 while I can get upwards of 55 for cars).

With that said, when I try to use the 2 class detect net with a dataset created from KITTI using dontcare,car,pedestrian it only seems to detect cars no pedestrians at all (i.e. the mAP for class 1 remains 0 for the entire training process). I would like to perform inference on this, however with an mAP of 0, pedestrians aren't detected when using inference.

Can anyone point me in the right direction when training the 2-class network?

varunvv commented 7 years ago

@eweill

You need to change the .prototxt file for detecting multi classes. Refer the links below

https://github.com/NVIDIA/caffe/blob/caffe-0.15/examples/kitti/detectnet_network.prototxt https://github.com/NVIDIA/caffe/blob/caffe-0.15/examples/kitti/detectnet_network-2classes.prototxt

gheinrich commented 7 years ago

It's notoriously difficult to train multi-class DetectNet however in case this helps, I found it easier to first train a single-class network (cars) and then fine-tune on the 2-class network (cars + pedestrians).

eweill commented 7 years ago

@gheinrich Since I have trained both a single class car and pedestrian network, I would like to try and fine-tune on a 2-class network. However, I am unable to use the weights directly due to dimensionality of the classifier layer.

I saved my 1-class car trained network as a "pre-trained model" in DIGITS. I then selected the 1-class model, and customized it by pasting in the 2-class prototxt. After selecting all my other desired parameters, I started the training and received the following error:

ERROR: Cannot copy param 0 weights from layer 'cvg/classifier'; shape mismatch. Source param shape is 1 1024 1 1 (1024); target param shape is 2 1024 1 1 (2048). To learn this layer's parameters from scratch rather than copying from a saved net, rename the layer.

Is there an easy way to convert either the model file or the weights file to "fine-tune" in the manner that you were suggesting or am I thinking about it entirely incorrectly? Thanks.

gheinrich commented 7 years ago

Why don't you rename cvg/classifier as suggested in the error message?

eweill commented 7 years ago

@gheinrich Thanks. I realized that as soon as I posted the Error message and got that fixed. I still am unable to train the 2-class network. I simply use a single class network that is trained on cars (with an mAP of 62, so it performs quite well) and try to "fine-tune it on pedestrians as the second class.

I replaced the custom network with a 2-class DetectNet, performed mean subtraction, and set other hyper parameters (learning rate, learning rate policy, batch size/accumulation, etc.) and no matter how I set the hyperparameters, class 2 remains at 0 mAP no matter how many iterations I choose to train the network. Am I missing an important piece to make the network train on the second class as well? Below are some of the current results I am getting.

ShervinAr commented 7 years ago

@eweill Hi there, did you manage to resolve the problem you had mentioned? I am facing the same problem...

eweill commented 7 years ago

@ShervinAr I haven't resolved the problem yet. I have tried many different approaches and no matter how I create the LMDB dataset or train the model (different hyperparameters), I can't seem to get the second class to ever detect (mAP stays at 0 for 60 or 120 epoch).

ShervinAr commented 7 years ago

@eweill My guess is that the problem might have roots in : 1-different learning rates required for learning cars vs. pedestrians 2-insufficiency of training/val images for pedestrian detection: not all images in the training set include pedestrians and I am not sure how the network would react to this phenomenon...

hadign20 commented 7 years ago

@eweill Hello. I have a similar problem and have a basic question. For multi class detection do I have to change the dataset first? I mean if the dataset label files has 10 class types (car, pedestrian, van, pickup, ...) do I have to delete the lines for van, pickup, ... or just setting the dontcare, car in the dataset creation in digits is enough?

deagarwa commented 7 years ago

Hello All,

I used the "detectnet_network-2classes.prototxt" file to train my network on Kitti data for cars and pedestrians. But I am getting the following error while training: << error code -11 Train net output #0: loss_bbox = 1.12773 ( 2 = 2.25547 loss) Train net output #1: loss_coverage = 4.84529 ( 1 = 4.84529 loss) Iteration 1264, lr = 0.0001 Snapshotting to binary proto file snapshot_iter_1276.caffemodel Snapshotting solver state to binary proto file snapshot_iter_1276.solverstate Iteration 1276, Testing net (#0) Ignoring source layer train_data Ignoring source layer train_label Ignoring source layer train_transform

Any suggestions?

Thanks Deepika

lbin commented 7 years ago

Training multi-class network based on DetectNet is not very easy, recently I got some better results. I will share some tips on how to train this type network. download

RomanSteinberg commented 7 years ago

@lbin have you published your tips?

shinaushin commented 7 years ago

^I am also wondering the same. I am running into similar problems with the other class have 0 mAP throughout the whole training procedure.

antoniodourado commented 7 years ago

@varunvv @lbin @gheinrich

I noticed that on 2-classes prototxt there's these lines: object_class: { src: 1 dst: 0} # cars -> 0 object_class: { src: 8 dst: 1} # pedestrians -> 1

1-Could you explain the "src" field? The "dst" field seems to be the class index as the comment implies. 2-Since we identify classes as index (0 and 1 for example), how does it matches label names on Kitti format (cars as 0, pedestrian as 1):

Ty so much.

gheinrich commented 7 years ago

Hello, the src field needs to be set to the index of your class in the class mappings (see https://github.com/NVIDIA/DIGITS/blob/digits-5.0/digits/extensions/data/objectDetection/README.md#custom-class-mappings).

linuzlover commented 7 years ago

Hello,

has anyone successfully trained a 2-class detectnet? Can anyone share some tips regarding this topic?

Thank you. AP

sulth commented 7 years ago

Iam getting single class detected.please someone helps me to understand why the 2 class detection not working even after the above network is used ,with kitty dataset.

aprentis commented 7 years ago

@lbin Could you share your results? i have the same problem with only one class learning correctly.

sulth commented 7 years ago

@lbin please share the tips on 2 class training.i tried different approaches but failed to learn pedestrian.It is learning the car only.

sulth commented 7 years ago

@lbin Please someone help me to make 2 class detection possible.I have tried many approaches but failed to.

elaith9 commented 6 years ago

Anyone?

Maxfashko commented 6 years ago

@elaith9 I managed to train two classes on the MS COCO data set. This requires a large amount of time and pre-weighed scales. One of the classes after 300 epochs did not rise above 1 map, the second class 5 map. It looks like the tiny-yolo will work much more reliably. SSD for TensoroRT have not yet been implemented, so there is a chance of trying to succeed with networks such as Mobilenet, SqueezeNet.

adithya-p commented 6 years ago

@eweill How did you fix this ERROR: Cannot copy param 0 weights from layer 'cvg/classifier'; shape mismatch. Source param shape is 1 1024 1 1 (1024); target param shape is 2 1024 1 1 (2048)?

elaith9 commented 6 years ago

@adithya-p @sulth @linuzlover @shinaushin Ok, I see a lot of you are struggling with the same problem as I did. I've decided to write two blog posts about it and explain in details how to do custom multiclass object detection with DIGITS. There you go:

https://labs.coria.com/blog/computer-vision/PreparingDataForCustomObjectDetectionUsingNvidiaDigits?sc_camp=33AFA8630062426190B5760C8FDF17CF https://labs.coria.com/en/blog/computer-vision/TrainingACustomMulticlassObjectDetectionModel?sc_camp=33AFA8630062426190B5760C8FDF17CF

If you have any questions, feel free to ask.

dqthebt24 commented 6 years ago

@elaith9 Thank you so much but the link died. Can someone share us how to train multiclass detectnet on DIGITS? (may be with KITTI dataset or other dataset)

elaith9 commented 6 years ago

@dqthebt24 links should be working now. Sorry.

dqthebt24 commented 6 years ago

Thank you so much @elaith9

Adithyak1998 commented 6 years ago

Can DetectNet be used to train models for detecting more than 2 classes? If yes, are the changes to be made similar to the link u posted, @elaith9 ?

adithya-p commented 6 years ago

@Adithyak1998 Yes, DetectNet can be used to train models for detecting more than 2 classes. The changes are similar to the links posted above. Use diffchecker to get an insight of what's happening.

mindmad commented 5 years ago

@elaith9 if you don't mind , can you re-update you links . they doesn't work with me.

MarcoGonnelli74 commented 4 years ago

I wrote a small article on how to crate a dataset and train a two class DetectNet on Digits. You can read my post here

https://www.deeplearning-blog.com/2020/01/31/how-to-train-a-two-class-detectnet-neural-network-on-digits/

Hope it helps Cheers Marco

NVIDIA / DIGITS

Car-Pedestrian multi class object detection DetectNet mAP and inference #1359