av-savchenko / face-emotion-recognition

Efficient face emotion recognition in photos and videos
Apache License 2.0
654 stars 124 forks source link

Preprocessing of images to run inference #16

Closed isa-tr closed 1 year ago

isa-tr commented 1 year ago

Hello, thank you very much for your work.

I am trying to preprocess a batch of images (I have my own dataset) the way you prepared your data. I'm following the notebook train_emotions.ipynb as it is in Tensforflow and I'm using that framework.

I have a question about the steps of the preprocessing, so I would like to ask you if you can tell me the correct steps. These are the steps I'm following, let me know if I'm right or if something is missing:

  1. I already have my images with the faces detected and croppped, i.e, I have a dataset full of faces like this frame9

  2. img = cv2.imread(img_path)

  3. img = cv2.cvtColor(img, cv2.COLOR_BGR2RGB)

  4. img = cv2.resize(img,(224,224))

  5. Then your notebook shows you make a normalization def mobilenet_preprocess_input(x,**kwargs): x[..., 0] -= 103.939 x[..., 1] -= 116.779 x[..., 2] -= 123.68 return x preprocessing_function=mobilenet_preprocess_input

Here I am having an issue because I cannot cast the subtraction operation between an integer and a float, so I changed it to

def mobilenet_preprocess_input(x,**kwargs): x[..., 0] = x[..., 0] - 103.939 x[..., 1] = x[..., 1] - 116.779 x[..., 2] = x[..., 2] - 123.68 return x preprocessing_function=mobilenet_preprocess_input

So, let me know if the process I'm following is correct or if there's something missing.

Thank you!

av-savchenko commented 1 year ago

Thanks for your question! Your images look nice, I believe you could use the models from my repository. The preprocessing function is appropriate for my Tensorflow model (mobilenet_7.h5) only. If you want to use more accurate PyTorch models, the preprocessing is slightly different, something similar to https://github.com/HSE-asavchenko/face-emotion-recognition/blob/main/python-package/hsemotion/facial_emotions.py#L39

isa-tr commented 1 year ago

Thank you for your answer! Yep, I'm also checking the preprocessing to use Pytorch's pre-trained models. As I could see you use these transformations:

IMG_SIZE=224 preprocess = transforms.Compose( [ transforms.Resize((IMG_SIZE,IMG_SIZE)), transforms.ToTensor(), transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]) ] )

Am I right?

And thank you again for answering to my question :D

av-savchenko commented 1 year ago

You're correct if you will use enet_b0 models. I personally recommend them because of their stable results on different datasets. You could also try enet_b2 models with potential higher accuracy, but it will be better to increase the resolution of an input image for enet_b2 models by setting IMG_SIZE=260. BTW, all technical details for PyTorch models are encapsulated into hsemotion python package, which you could use out-of-the box for the facial images from your dataset.

isa-tr commented 1 year ago

Thanks!!