Understanding cascading of sizes in mtcnn

Hi, Im trying to follow through the code and understand how mtcnn works. I understand that for each image, for each scale the detection comes from each of the networks. In particular I am talking about the Pnet right now.

# Code file: mtcnn_detector.py
local_boxes = self.Pool.map( detect_first_stage_warpper, izip(repeat(img), self.PNets[:len(batch)], [scales[i] for i in batch], repeat(self.threshold[0])) )

The image is rescaled according to the scales produced earlier and the rescaled image (now called input_buf) goes into the Pnet.

# Code file: helper.py
# ORIGINAL Height:  340
# ORIGINAL Width:  151
# SCALE USED (were computed before):  0.107493555074
# RESCALED Height:  37
# RESCALED Width:  17 
output = net.predict(input_buf)

For reference I have printed out the original size and the rescaled size. The net corresponds to Pnet and in det1.prototxt (PNet) the input size should have h=12 and w=12.

# Code file: det1.prototxt 
input_dim: 1
input_dim: 3
input_dim: 12
input_dim: 12

What I don't understand is where is the size going from size of input_buf to 12x12?

YYuanAnyVision / mxnet_mtcnn_face_detection

Understanding cascading of sizes in mtcnn #19