This is Team 900s Vision Code for Recycle Rush.
The goal was to write code that would detect bins and find their location. This was done using cascade classifiers. The final binary is cross platform and accelerated using a NVIDIA GPU whenever availble.
To start you need:
Use imageclipper to extract samples of the object from the positive videos. Imageclipper's README is very good and there is a slight modification to the program. The program will highlight your selection in green when the aspect ratio is best for the classifier. Repeat this process until you have around e^4.6 positives. Make sure to get pictures from different angles and in different conditions. Also make sure that when you grab them there isn't much else in the sample except the object.
Run the framegrabber on a negative video. By default this should give you 1% of all frames to use as negative images for the initial classifier stage.
Put the negatives into the negative_images directory and the positives into the positive_images directory. Both are within cascade_training.
Run prep.sh. This takes each positive image and creates a set of randomly rotated versions to use for training. Outputs a .vec file. There are some values you can tweak at this point and documentation of those are in the prep.sh file.
READ the run_training.pl documentation. These are important parameters to tweak before running the script. Here's a stripped down version:
Run run_training.pl. This will open a command window and show you information about how the classifier is doing. Example output:
===== TRAINING 0-stage =====
<BEGIN
POS count : consumed 9000 : 9000
NEG count : acceptanceRatio 10000
Precalculation time: 37
+----+---------+---------+
| N | HR | FA |
+----+---------+---------+
| 1| 1| 1|
+----+---------+---------+
...stages
+----+---------+---------+
| 8| 0.999333| 0.8318|
+----+---------+---------+
... more stages
+----+---------+---------+
| 24| 0.999111| 0.4605|
+----+---------+---------+
Each stage accepts 99.9% of real targets and rejects 50% of false alarms. With each additional stage, almost all positive images will be successfully detected while an additional 50% of the remaining false alarms (false positives) will be filtered out. With a sufficient number of stages (30-40?) a large number of positives will be let through the last stage while a huge majority of false alarms will be filtered out.
After about 25 stages stop training. The training code might also fail or run very slowly due to a lack of negative images.
Run create_cascade.sh. Put the classifier into the generate_negatives folder and generate a fresh set of negative images from assorted negative videos. This will generate a set of images known as hard negatives - ones which are detected by the current classifier but shouldn't be.
Add these new negatives negatives to the negatives directory.
Continue training the current classifier by running run_training.pl. This will automatically include the new negatives in the training data.
After any stage and after running create_cascade the output can be used as a classifier for detection code (such as the code in bindetection directory). Typically the first pass of this process will detect some images but miss many as well. Using the first classifier as a guide, grab images of the target which aren't detected by the current classifier. Restart the process from scratch with these additional images included in the positive_images subdir. Remember to change the -data parameter to a new directory in prep.sh. This is to make sure that the code doesn't restart from the old classifier.
It will usually take a number of times through the training process to get a usable classifier. Running the old classifier and watching for images which aren't detected will highlight what needs to be clipped and added to the positives for the next pass of training.