Open turbogeek opened 5 years ago
@turbogeek have u explored one shot object detection: https://apple.github.io/turicreate/docs/userguide/one_shot_object_detection/ ?
Hi, sorry for the very late reply. I just tried one shot, but having a few issues. Part of the problem is that I am getting very tiny images that would be vastly inappropriate to the task. Is there a way to control the relative size of the image to the target? FYI otherwise, this might work for me. My application is marked items like the attached image (easy to see here). My problem is a bit odd. Imagine looking for a particular dalmation in a gallery. So it is a dog, but a particular dog. I have many of these dogs (52, not 101), all slightly different but distinct.
Imagine this on a table where the image at most takes up about 1/5th - 1/8th or so of the image.
I have tried using a small amount of the image, that comprises the catagorizable part of the image, without success.
Just to add a little more, the one shot needs precise controls. Currently trying images that have an alpha channel but a large number of generated images have the target too small, distorted or pixelated. Need to have control over scaling, rotate, tilt, shear etc. Would be great to either replicate or just use the https://github.com/mdbloice/Augmentor capabilities that allow the user to create a pipeline that creates images and data files on disk.
Seems like the user needs to be able to tune images such that a human could confirm the configuration of augmentation and images are good enough for training (if I ran't recognize it, the machine can't). Also need an easier way to add my own images by pointing to a directory.
Also need to now what are the proper sizes for backgrounds that would be best for success?
One more thing, can we also augment the generated images?
Is there any built-in or easily added way to increase the number of training images for object training? In particular, I need a way to increase my examples which are symbols on a flat plane where my symbol is distorted by viewing angle, scale, focus blur, perspective (keystone?), lighting angle, etc. For example, think of trying to recognize the suit and value of a playing card using object detection. I have about a hundred examples from various distances, angles and rotations, but it has a poor recognition rate probably because I probably need to have a lot more examples that fill in the gaps. The biggest issue is creating the training data with the ground truth rectangles which would be better if I could tween (create images in-between key images) missing data.