Detectnet, may it help for self-training ? (now yes, hopefully)

ontheway16 commented 7 years ago

Hello. The custom dataset I am training on Detectnet reached to 65 mAP value and I cannot make it score any better, with current number of images (about 1800, with 4000 labels). While the mAP score is as low as 65, the actual recognition seem much higher then this, in my condition, nearly 90 percent of objects always recognized correctly, in inference. What I understand is more photos needed for even higher mAP scores and then decided writing a macro for manual labeling and if needed, reviewing the annotations. The macro is now ready and soon I will start labeling new images.

Of course, its quite time consuming to label additional thousands of images. While thinking on inference outputs, a question arised.

Why Detectnet don't produce its own "candidate" training-set ?

Once an actual (realworld) high recognition percent achieved by manual labeling efforts, the system can then start (kind of) feeding itself..

In inference page, along with BBox and Rawdata outputs, a button may let us record a detectnet-formatted label text file for each of tested images. All we have to do is investigate "candidate" labels and discard (dontcare?) irrelevant entries manually in a labeling software, which will be a lot faster then doing "all of it" by hand, especially in 10s or 100s of detections per image situations.

If I dont miss something, it seems not complicated to add and can help big to users. Hope we can see such an enhancement in future releases of Digits.

Regards,

Alper

gheinrich commented 7 years ago

Using a network to generate its own ground truth doesn't sound like a good idea... data augmentation and regularization is a more common way of addressing the issue of "too little data".

That being said, the notion of mAP in DetectNet is rather confusing as its calculation does not follow the typical definition, see https://github.com/NVIDIA/caffe/issues/234. You might want to pay more attention to precision and recall.

ontheway16 commented 7 years ago

@gheinrich thank you for commenting about it but I still have questions in mind. In my case, we are not talking about "too little data". There are thousands of labels and images, and real world accuracy is about 90% already. What I am hoping is making it perform even better with additional thousands. But, if your point is "a trained model will detect only the highly similar ones to whats trained with, therefore will not contribute to performance" then its understandable. If its capable to detect objects containing slight dissimilarities to training set ones, and in return this features contribute the next training, then this might be considered as a gain. I was hoping for the second.

RSly commented 7 years ago

in a first step, you can simply annotate your new data by the bbox output from detectnet, and use it to retrain detectnet! it doesnt need to be all automatic ;)

and let us know if it helps !

ontheway16 commented 7 years ago

@RSly thats my goal too, I think most people consider that it will not be a big problem since there will be a few objects to label per photo, but this is not the case for me. I need to label about 40-50 tiny objects per 5000x1000 res. image. Thats why I am looking for such a semi-automatic label creation. Even if it labels very similar ones to training set, still takes a lot of workload since otherwise I will label them too, by hand.

gheinrich commented 7 years ago

You can use DetectNet to generate candidates and that might save a lot of time during the annotation process but I think you'll need to manually curate the labels to make sure there aren't (too many) false positives/negatives. Or you can pay Amazon Mechanical Turks to do the work for you :-)

ontheway16 commented 7 years ago

@gheinrich Hey Greg, Thanks for the Mechanical Turk idea :) But I think I'll stick with Biological Turk (me) since its free of charge grin You are right about false positives and negatives, but with my dataset at current training level (mAP about 65.xx), I have encountered NO incorrect detections in several hundreds of inference tests (single class). Currently only remaning problem is non-detected (missed) objects, some are touching to each other (clustered) and some are singular. I think I have to find a way to parse the bounding box output at inference screen. Allthough I am still thinking it would be a useful enhancement if implemented in Digits. Reviewing the accuracy of self-generated labels is not a problem, can be done very quickly with an appropriate sw utility.

ontheway16 commented 7 years ago

life is short to label manually...

listlm.sh

enjoy !

// //This bash script was tested under Ubuntu 14.04 and requires this json2csv

RSly commented 7 years ago

hi @ontheway16 , so how did go? => did using detectnet results as new training sets work to improve the final results?

ontheway16 commented 7 years ago

Hi @RSly, actually I am waiting for scanning new set of images for additional data creation, in a few days. But after some initial tests with existing images with manual labeling, I realized that its able to detect and label objects which were skipped/not labelled during manual annotation, seems like good news to me.

ontheway16 commented 7 years ago

Ok, I have faced with a different problem during tests. Here it is: Since I use scans for training and inference, I decided to scan at a higher reslution then what the training images were scanned, so scanned a lot of material at 2000DPI, instead of 1200. Was hoping for a better resolution since my objects are small.

And when I apply inference, the number of detections were good, but, the bounding box rectangles were smaller then actual size of objects, resembling the size of 1200dpi bounding boxes. So, it detects the object succesfully, but I could not feed it to the script for label re-creation, cause they are smaller then the object. When I try scans at 1200 dpi, there is no problem at all, bounding box dimensions are pixel perfect. I really want to scan docs at 2000dpi but now I dont know what happens if I mix 1200dpi and 2000 dpi scans during training.

ontheway16 commented 7 years ago

I am currently using a GUI version of listlm.sh bash script to build labels on new images and in my case, I am happy to report that its is very successful. Of course I have to check labels but false positives and negatives are certainly at quite tolerable numbers (around 1-2%). The issue I mentioned above (smaller bboxes around objects) continues and I am correcting them manually, but still easier to label then fully manual labeling.

NVIDIA / DIGITS

Detectnet, may it help for self-training ? (now yes, hopefully) #1424