experiencor / keras-yolo2

Easy training on custom dataset. Various backends (MobileNet and SqueezeNet) supported. A YOLO demo to detect raccoon run entirely in brower is accessible at https://git.io/vF7vI (not on Windows).
MIT License
1.73k stars 785 forks source link

QUESTION: Feature extraction, CNN input / output #346

Open IcyFrequency opened 6 years ago

IcyFrequency commented 6 years ago

Hi,

I'm wondering about a few things that I find hard to understand:

Before running predictions on a input image, the features for the image has to be extracted for the model to detect resemblances to features for a certain class, right?

Since I'm kind of new to CNN;

Hope someone finds the time to answer these question :))

And HUGE thanks in advance! Really appreciate the help here.

rodrigo2019 commented 6 years ago

Where in the code are the features for a image extracted and are those features the class probabilities of 1000 classes from the ImageNet dataset?

backend.py, but you will not find the 1000 classes, because in this implementation the model heads are not used.

How are the images inputted into the Neural Network for this implementation and how are the color channels manipulated with regard to this?

It depends on wich backend are you using, you can check the manipulation in the normalize function from each backend model, these functions also are in the backend.py file

What is the raw output of the network and how are they converted to bounding boxes?

your answer is here

How is the annotated boudning box taken into consideration when training the Neural Network? Are those part cropped out as entire images, or are the annotated boxes giving the overlapping grid on the image class probabilities?

the images are not crop, the bound box are calculated using coordinates, maybe is better to you check the paper, it uses a special loss functions to compute based on "anchors"

IcyFrequency commented 6 years ago

@rodrigo2019

Thanks for your input. I'll leave this QUESTION open in any case someone wants to go more in-depth or until i explain it myself after a deeper dive.

Thanks