Derivation of bbox_label

fninsiima commented 7 years ago

Hey @Seanlinx. Just want to say that i found a link to this repo on this page

I'm interested in training a model for a custom dataset (not the faces), but i'm not sure what you meant by: bbox_label are the offset of x1, y1, x2, y2, calculated by (xgt(ygt) - x(y)) / width(height)

I'm wondering what xgt, ygt, which width, which height in this case represent (I'm not familiar with the WIDER FACE dataset)

What i have are the coordinates for the (left upper corner) and also (lower right corner) of the bbox-rectangles.

Please help me make sense of xgt and ygt.

Also: i started off by installing mxnet 0.9.5 using pip.... yet somewhere you ask us to "modify mxnet/src/regression_output-inl.h according to mxnet_diff.patch before using the code for training."

Does this training require me to use your cloned version of the mxnet repo?

Seanlinx commented 7 years ago

@fninsiima (x1, y1), (x2, y2) are the coordinates of the left upper corner and lower right corner of bbox respectively. 'width' and 'height' are the width and height of the training sample, while (xgt, ygt) are the coordinates of the corresponding ground truth bbox. so the offset of x1 is computed by (x1_gt - x1) / (x2 - x1).

You can clone the repo from https://github.com/dmlc/mxnet, reset to v0.9.3 then install it.

fninsiima commented 7 years ago

Okay. Please bare with me for a moment.

If i annotate my image, and one of the objects is in the bbox (x1,y1),(x2,y2) representing left upper and lower right... I thought this is the definition of ground truth coordinates.

So xgt and ygt correspond to what exactly, the mid-point (center) of that bbox? Even if this was the case, this would not account for the negatives in 12/positive/28 1 -0.05 0.11 -0.05 -0.11.

Also, about the path example... '12/positive/28 1 -0.05 0.11 -0.05 -0.11' so the extension of the image file is not relevant e.g. '12/positive/28.jpg'?

On Mon, May 22, 2017 at 6:28 AM, Xuan Lin notifications@github.com wrote:

@fninsiima https://github.com/fninsiima (x1, y1), (x2, y2) are the coordinates of the left upper corner and lower right corner of bbox respectively. 'width' and 'height' are the width and height of the training sample, while (xgt, ygt) are the coordinates of the corresponding ground truth bbox. so the offset of x1 is computed by (x1_gt - x1) / (x2 - x1).

You can clone the repo from https://github.com/dmlc/mxnet http://url, reset to v0.9.3 then install it.

— You are receiving this because you were mentioned. Reply to this email directly, view it on GitHub https://github.com/Seanlinx/mtcnn/issues/36#issuecomment-302990878, or mute the thread https://github.com/notifications/unsubscribe-auth/AFelGGVx7Ef4rPBvVJhqqmOBec5SG5Phks5r8QDTgaJpZM4NhjXh .

-- Flavia Ninsiima Delmira MSc Research Student Mobile Crops Surveillance (MCROPS) AI Lab, School of Computing, Makerere University air.ug/mcrops http://air.ug/ http://air.ug/

Seanlinx commented 7 years ago

@fninsiima (x1,y1),(x2,y2) are the coordinates of the training sample in their original images. (xgt, ygt) stands for the left upper or lower right of the ground truth bbox. I just dropped the subscripts.

All the images are of JPEG format so I didn't have the file extension included, I joined the extension in method image_path_from_index() in core/imdb.py.

Seanlinx / mtcnn

Derivation of bbox_label #36