Open ghost opened 5 years ago
The 300x300 model has only two output channels.
Does that mean the output should only be a 2x300x300 and not a 21x300x300? Then, in the following block of code from face_seg.py, are you simply taking the maximum of the two channels?
# run net and take argmax for prediction
net.forward()
out = net.blobs['score'].data[0].argmax(axis=0)
The end result must be a binary mask, but all of the outputs I'm seeing in the 21x300x300 array I get are float values, many of which are even negative. I'm wondering if the output of net.forward() may be different in the actual Caffe library and in openCV's readFromCaffe. I'm not sure though...
For reference, here is my code:
import numpy as np
import cv2
image = cv2.imread('Alison_Lohman_0001.jpg')
# Define prototext and caffemodel paths, and create model
caffeModel = "face_seg_fcn8s.caffemodel"
prototextPath = "face_seg_fcn8s_deploy.prototxt"
net = cv2.dnn.readNetFromCaffe(prototextPath,caffeModel)
# Resize to 300x300
image = cv2.resize(image,(300,300))
# blobImage convert RGB (104.00698793,116.66876762,122.67891434)
blob = cv2.dnn.blobFromImage(image,1.0,(300,300),(104.00698793,116.66876762,122.67891434))
# Passing blob through network
net.setInput(blob)
output = net.forward()
output is a 1x21x300x300 float32
Ah I see what you mean about the 300x300 model. It produces a 1x2x300x300 output.
I'm still unsure of what to do with the two resulting images. After normalization, they look like this: for the Alison_Lohman_0001.jpg image
Did you find the answer to this? I'm trying to use it in openCV as well, same problems.
@kiralygomba no unfortunately not.
@kiralygomba looks like the only thing missing is the last operation: mask=output[0].argmax(axis=0) mask=1*(mask>0)
Though I must say it doesn't quite work as I thought
I had a hard time getting Caffe installed, so I figured I'd try out your model using openCV's dnn.readNetFromCaffe() along with your .caffemodel and .prototxt.
The output of net.forward() when using this method is a 1x21x300x300 matrix which can be squeezed to 21x300x300. Each of the 21 300x300 arrays, when normalized, seem to constitute a type of heat map. Some of these can be seen below.
My question is, how would I combine these to get the actual face segmentation? I tried to parse your code to see if I could figure it out but fell short of understanding. Thanks!