Open Girish-03 opened 3 years ago
Hello,
Thank you for your interest. It is hard to say what is wrong without having the code. Are the values of the input image in the range [0;1] ? Otherwise, I would advise you to have a look at the dataloader we provide for the AffectNet dataset and in particular these lines : https://github.com/face-analysis/emonet/blob/master/emonet/data/affecnet.py#L122#L131 This is where we apply the transformations to the cropped images obtained from a face detector.
You can also look at these lines in the test.py file : https://github.com/face-analysis/emonet/blob/master/test.py#L35#L51 This is where the transformations are created and passed to the dataloaders.
Hope this helps
Hi,
The work is really amazing and results seems to be astonishing.
I am a student and trying to use this code for one of my research project. I would like to know if there is a specific pre processing technique to be used before feeding in the images to the network. For instance, I am detecting the faces in video frames using OpenCV Caffe model DNN face detector, cropping it, resizing it to 256x256 and feeding to the network. But, the valence and arousal values along with categorical emotion I am getting is not matching for many frames. I am assuming I might not be doing some preprocessing of input frames as required by Emonet model. Also, if there is any specific technique to be used for detecting and cropping the faces. Therefore, requesting your guidance here. I performed the estimation and visualization on the same video provided in the paper to compare to your results, but its not the same. Below is the link to the video with original results (Valence arousal bars and categorical emotions) and results from my pre processing (as explained above).
(The Green vertical and blue horizontal bars with emotion in red text are my results.) Using 5 class model https://drive.google.com/file/d/1--GW_J3XUDNbo59YOTbLJ-VPWS4-2oey/view?usp=sharing Using 8 class model https://drive.google.com/file/d/1jJ9Ah7rcoN3aVkLYPq8cDajdRTnwsamU/view?usp=sharing
Hello @Girish-03 Like you, I got different results compared to the demo. Is your problem solved?
Hello, Sorry for the delay in answering. We do not do any specific preprocessing apart from what is done in the DataAugmentor class : https://github.com/face-analysis/emonet/blob/master/emonet/data_augmentation.py#L47
One issue I can think of is the fact that OpenCV loads image in BGR format, whereas our network was trained using the RGB format (we load images using skimage - see the affectnet dataloader, get_item function : https://github.com/face-analysis/emonet/blob/master/emonet/data/affecnet.py#L120). Maybe this is the issue...
Hope this helps!
Hello I wonder what the "4 dimensional input" exactly is. I followed the "DataAugmentor" but only got a three dimensional input. This is the bug: "RuntimeError: Expected 4-dimensional input for 4-dimensional weight [64, 3, 7, 7], but got 3-dimensional input of size [3, 256, 256] instead"
I'm also having an issue validating the network's prediction on stock images I do suspect that the problem lies within the normalization and data preparation part.
I've tried many variations including flipping the channels from RGB to BGR cropping the image to include only the face using a an off the shelf face detector (validated that teh cropped image only includes my face)
normalized the input array:
always resize image to 256,256
non of the above variations worked and the network still predicts the wrong emotion and valiance and arousal (target is happy emotion, which should give high positive valence and positive arousal)
in the code in the repository there is no code that does input normalization, only resize transform
could you please point me to the correct data preparation steps?
Thanks
image_transforms = transforms.Compose([
transforms.Resize((256, 256)),
transforms.ToTensor(),
])
def classify1(model, image_transforms, image_path ):
image = Image.open( image_path)
image = image.convert('RGB')
image = image_transforms(image)
image = image.unsqueeze(0)
image = image.cuda()
output = model(image)
print(image_path, ',',output['expression'][0, :].tolist(),',',np.argmax(output['expression'][0, :].tolist()),',', output['arousal'].tolist(),',', output['valence'].tolist())
I got result as it is.
image_transforms = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), ]) def classify1(model, image_transforms, image_path ): image = Image.open( image_path) image = image.convert('RGB') image = image_transforms(image) image = image.unsqueeze(0) image = image.cuda() output = model(image) print(image_path, ',',output['expression'][0, :].tolist(),',',np.argmax(output['expression'][0, :].tolist()),',', output['arousal'].tolist(),',', output['valence'].tolist())
我得到了结果。
nice!
image_transforms = transforms.Compose([ transforms.Resize((256, 256)), transforms.ToTensor(), ]) def classify1(model, image_transforms, image_path ): image = Image.open( image_path) image = image.convert('RGB') image = image_transforms(image) image = image.unsqueeze(0) image = image.cuda() output = model(image) print(image_path, ',',output['expression'][0, :].tolist(),',',np.argmax(output['expression'][0, :].tolist()),',', output['arousal'].tolist(),',', output['valence'].tolist())
我得到了结果。
好!
goodgood
Hi,
The work is really amazing and results seems to be astonishing.
I am a student and trying to use this code for one of my research project. I would like to know if there is a specific pre processing technique to be used before feeding in the images to the network. For instance, I am detecting the faces in video frames using OpenCV Caffe model DNN face detector, cropping it, resizing it to 256x256 and feeding to the network. But, the valence and arousal values along with categorical emotion I am getting is not matching for many frames. I am assuming I might not be doing some preprocessing of input frames as required by Emonet model. Also, if there is any specific technique to be used for detecting and cropping the faces. Therefore, requesting your guidance here. I performed the estimation and visualization on the same video provided in the paper to compare to your results, but its not the same. Below is the link to the video with original results (Valence arousal bars and categorical emotions) and results from my pre processing (as explained above).
(The Green vertical and blue horizontal bars with emotion in red text are my results.) Using 5 class model https://drive.google.com/file/d/1--GW_J3XUDNbo59YOTbLJ-VPWS4-2oey/view?usp=sharing Using 8 class model https://drive.google.com/file/d/1jJ9Ah7rcoN3aVkLYPq8cDajdRTnwsamU/view?usp=sharing