AlexeyAB / darknet

YOLOv4 / Scaled-YOLOv4 / YOLO - Neural Networks for Object Detection (Windows and Linux version of Darknet )
http://pjreddie.com/darknet/
Other
21.8k stars 7.97k forks source link

Change the shape of grids. #2977

Open sinhatushar opened 5 years ago

sinhatushar commented 5 years ago

I want to change the shape of grids to 2*B(For my case, I want to detect objects whose breadth is same as the breadth of the image), where B is the breadth of the image. What all changes will I have to make ? Do I need to change the architecture of the model as well ?

AlexeyAB commented 5 years ago

Do you want to use 2*B x 2*B network size? Then just set width= and height= in cfg-file (must be mutiple of 32).

If object_width=image_width, then why do you want to set network size 2*image_width ?

sinhatushar commented 5 years ago

I'm sorry I think I didn't describe it correctly. Basically I want grids of shape 2 x B where B is breath of image. For eg. if image is 416x416, there will be 208 grids each of size 2 x B. I think this will need changes shape of input layer and changes in shape of final output layer. Can you please guide me what all changes to make?

Problem statement: I am trying to train a model to find subject, predicate and object in a sentence using YOLO. First I use 300 dimensional Glove, Fasttext and wordtovec embedding to convert a sentence to 3d channel by using vector embedding. eg. : US : [300 dimensional vector] is : [300 dimensional vector] a : [300 dimensional vector] country : [300 dimensional vector] .

Now, using 3 vector embeddings, this sentence can be changed to a 4x 300 x 3 matrix . Here 4 is because of length of sentence, 300 because all embeddings are 300 dimensional and 3 because I am using 3 different word embeddings. Now I plan to zero pad these to make the shape 416 x 416 x 3 which is the input for the model . However, I can't change this matrix to a .jpg image because vector embeddings have negative values as well. So, I have to make changes to read this matrix as input for the model instead of reading an image. This problem will be a 3 class identification problem and for each sentence it will identify 3 classes in it.

I have a labelled training set where subject, predicate and object in a sentence are identified.
I have made the training set already but I don't understand how should I change the grid shape from 13x13 to 2xB, here B will be 416. I doubt this will work but I want to train the model once.