dusty-nv / jetson-inference

Hello AI World guide to deploying deep-learning inference networks and deep vision primitives with TensorRT and NVIDIA Jetson.
https://developer.nvidia.com/embedded/twodaystoademo
MIT License
7.74k stars 2.97k forks source link

Issue with custom object detection #738

Closed silent-code closed 3 years ago

silent-code commented 4 years ago

I have gone through the steps of collecting and converting my custom dataset to PASCAL VOC format and re-training the mobilenet-v1-ssd-mp-0_675.pth base network. Now it accurately detects Class 1 objects with the Class 1 label, but when there is no Class 1 object in the image, the detector draws a box around the entire image, as if it were detecting the background but labeling it as Class 1.

Here is my label.txt list: BACKGROUND Class 1

I thought maybe something was getting messed up in retraining pipeline, so I tested simply converting the base network, mobilenet-v1-ssd-mp-0_675.pth to onnx without doing any re-trainning. It appears to work correctly, classifying multiple classes with expected accuracy. Thus it appears the re-training pipleline works.

I currently have 100 custom images containing Class 1. Do I need more so that it doesn't show detections when Class 1 is not in the image?

Thanks for your help.

dusty-nv commented 4 years ago

Hi @silent-code , when it draws a box around the entire image, does it label is as BACKGROUND, or as Class 1?

In your images of Class 1, does the object occupy nearly the full frame? Or is there some background space in those images too.

silent-code commented 4 years ago

when it draws a box around the entire image, does it label is as BACKGROUND, or as Class 1? Class1 only.

does the object occupy nearly the full frame? No, typically less than half with plenty of background.

dusty-nv commented 4 years ago

Hmm. When you are running the inference with detectnet, are you using the labels file that gets saved to your models folder by pytorch?

Is your models folder different from your folder with the dataset?

The labels.txt in the dataset folder should not have BACKGROUND. The labels.txt in the models folder should have BACKGROUND.

silent-code commented 4 years ago

"The labels.txt in the dataset folder should not have BACKGROUND."

The command I used to create the voc dataset is

python vision/datasets/generate_vocdata.py ./labels.txt

You are saying that the above labels.txt file referred to in the geterate vocdata command is the one that should not contain the BACKGROUND label, is that correct?

dusty-nv commented 4 years ago

I haven't personally used generate_vocdata.py (I think that was used by the upstream author for pre-training), so I'm not entirely sure if background should be in that labels.txt or not - but I think not.

I take it that you are trying to extract all of the "person" class from the VOC dataset or similar?

silent-code commented 3 years ago

Hmm, when using a custom dataset, I assumed we had to use generate_vocdata.py to convert the data into a usable format after the labelImg tool?

dusty-nv commented 3 years ago

I think you can save from labelImg directly in PASCAL VOC format.

That generate_vocdata.py appears to be for culling the VOC dataset to a subset of VOC classes.

silent-code commented 3 years ago

By itself, the labelImg tool does not appear to put it in a format train_ssd.py can use.

The script train_ssd.py calls voc_dataset.py which looks for ImageSets/Main/trainval.txt, etc. around line 21 - this is what generate_vocdata.py is creating.

So I thought it was necessary to use generate_vocdata.py. Am I missing something?

silent-code commented 3 years ago

SOLUTION: the labelImg tool does not create the full file structure. You need to do the following:

First create a label.txt file in the jetson-inference/python/training/detection/ssd directory (this labels.txt file should NOT have the BACKGROUND class listed, just the classes you want to train on e.g., dog, cat, rhino)

Place the following sub-directories populated by the labelImage tool outputs in the same directory: Annotations, JPEGImages

Then enter the following command from the ssd directory: python vision/datasets/generate_vocdata.py ./labels.txt

Now you are ready to train with the mb1 pretrained network: python3 train_ssd.py --dataset-type=voc --model-dir=models/my-models-voc --data=./ --pretrained-ssd='models/mobilenet-v1-ssd-mp-0_675.pth' --batch-size=4 --num-epochs=50

Then convert your trained model to onyx (delete the labels.txt file in the ssd directory since the above step creates for you a labels.txt file in the directory specified by --input : python3 onnx_export.py --input="./models/my-models-voc/name-of-model-you-want-to-convert.pth" --model-dir=models/my-models-voc

Parham-khj commented 3 years ago

Hi @silent-code , when it draws a box around the entire image, does it label is as BACKGROUND, or as Class 1?

In your images of Class 1, does the object occupy nearly the full frame? Or is there some background space in those images too.

Hey @dusty-nv ,

I have just one object ( Class 1 ) and obviously I've got two classes in label.txt ( BACKGROUND and CLASS 1) for using ONNX model with TRT. My input file is a video and the model detects and draws a bounding box which occupies nearly the full frame as CLASS 1. When the desire object comes to the frame it's able to successfully detect, but it also detects the entire frame as the Class 1 as well ( then I have two objects which one is correct object ( class 1) and entire frame as class 1 as well . Do you reckon this error comes from whether retaining with custom dataset?? or converting to the ONNX format? Thanks