AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.79k stars 7.96k forks source link

why are so many images required? #2888

Open vainaixr opened 5 years ago

vainaixr commented 5 years ago

Can you apply some of the concepts of siamese networks, one shot learning, few shot learning, matching networks to reduce the number of images required for training.

children can learn new objects by one or two images.

why do we require so many images for each class?

AlexeyAB commented 5 years ago

Can you provide any article that shows good accuracy with: siamese networks, one shot learning, few shot learning, matching networks, reinforcement learning, ...?

Children already previously saw the most objects, backgrounds, from different sides/scales/angles/illumintation.... So childer should learn only new object.

But even in this case, if you didn't every seen any device with wheels, and now you see a car, do you have to call the "car" all devices with wheels?

vainaixr commented 5 years ago

From what I understand,

1) if a child has never seen a giraffe before, and is shown a giraffe for the first time, and is given 10 new images, and those 10 images contain different objects, and one of them contains a giraffe, and the child is asked to tell which of the 10 new images contains a giraffe, then the child would easily be able to detect the giraffe.

2) if a child has never seen a giraffe before, and is told that giraffe is tall, it lives on land, it eats plants and is given info some other attributes of a giraffe, and then it is given 10 images of different objects, and is asked that which of the 10 images contains a giraffe, then the child will be able to do it, without having seen a giraffe before.

3) in no case, do we give 2000 images of giraffe, to make the child learn, just one image from a book, is enough, and when it watches any other image, then the child takes a difference between the first giraffe image it saw and the new images, and if the child finds the difference to be low (similarity to be high), then it would be able to detect giraffe in the new images.

4) this is what is done in siamese networks, where they have two cnns and different input image is given to both the cnns, and euclidean distance is taken between the feature vectors, and if the difference is low, then the images are same.

5) there are more techniques also, some of them are matching networks, prototypical networks, model agnostic meta learning, their aim is the same, to do image classification when only a few images are available.

6) these are some of the papers,

https://openreview.net/pdf?id=HkxLXnAcFQ https://arxiv.org/pdf/1703.03400.pdf

The accuracy is high on omniglot, but not so much on mini imagenet.

https://www.youtube.com/watch?v=b8JlilRnhM4

vainaixr commented 5 years ago

yolo architecture is used to give correct label, draw bounding box.

This will be one episode,

So, one episode means, we have some training images, and one test image. We will have multiple such episodes.

And these episodes are our meta training set.

What we are doing now with yolo, would be one episode, and we will have multiple such episodes in the meta training set.

In each episode, our model learns the parameters, same as what is happening with current yolo.

And our meta learning algorithm learns to make correct predictions for an episode.

Our meta test set, has an episode, with some training images, and one test image.

And this meta test set will perform detection on some different classes, not the ones we used in the meta training set.

So after using MAML, for new episodes, we need only few images in the training set, and we will get prediction for the test set.

For few shot, MAML.

MAML with yolo architecture.

Another update maml++ https://arxiv.org/pdf/1810.09502.pdf

dselivanov commented 3 years ago

@vainaijr any luck training MAML init for YOLO?