JLREx / PAtt-Lite

Official implementation for PAtt-Lite: Lightweight Patch and Attention MobileNet for Challenging Facial Expression Recognition
MIT License
33 stars 3 forks source link

Pre-trained Model Predicting All Images as Label 4 - Is This Correct? #6

Open HeeminYang opened 7 months ago

HeeminYang commented 7 months ago

Hello,

I've been experimenting with the pre-trained model you've uploaded, and I've noticed an issue. Regardless of the input, the model consistently predicts every image to be label 4. This behavior leads me to question whether the correct model has been uploaded.

Could you please confirm if this is the expected behavior of the model? If not, would it be possible to check the uploaded model file and ensure the correct version is available for download?

Thank you for your assistance.

ipa-anm-sy commented 6 months ago

I am also having the same issue. The model consistently predicts every image to be label 4. Can you please fix this issue?

M-U-X commented 5 months ago

how did you guys get it to work, what inference code did you use? @HeeminYang @ipa-anm-sy

ipa-anm-sy commented 5 months ago

I am not using this model for training or evaluation. I am just using it for getting predictions. I used the provided pre-trained model, and used the command ''model=tf.keras.models.load_model(model_path, compile=False)'' , before this I am using face detection algorithm, and cropping the face and resizing the images to (224,224) and image transforms, and I am extracting labels from the output of the model.

Chen1fly commented 5 months ago

Hello, could you please tell me where did you get the code?

MeneerAbel commented 5 months ago

Sorry but that is just the standard code to run any keras model ... If you plan to work with pre trained models it is still good to do a bit of research yourself ... one google search could have provided you with this. If it is a face detection algorithm you are loooking for, there is again many out there, for example deepface: https://github.com/serengil/deepface/tree/master Let's try to stay on topic. This issue is about the fact that the pre-trained model always predicts label 4, an issue that I would still like to have a reply on by the developers.

ipa-anm-sy commented 5 months ago

Can anyone reply on the main question in this issue? The model is only predicting label 4.

JLREx commented 5 months ago

Hi, we have just uploaded the training notebook. The evaluation code is at the end of the notebook. Still, it is recommended to train on your database before evaluating the model for optimal performance. Based on our evaluation, the cross-database results are not quite perfect.

HeeminYang commented 5 months ago

@JLREx thanks for the update. But this issue is for your model's convergence, and the notebook that you upload has not include any result about the model performance which reach the SOTA on FER+. The model which update on April 6 still predicts all label as 4. Your response is not unsuitable for this issue.

JLREx commented 5 months ago

@JLREx thanks for the update. But this issue is for your model's convergence, and the notebook that you upload has not include any result about the model performance which reach the SOTA on FER+. The model which update on April 6 still predicts all label as 4. Your response is not unsuitable for this issue.

Hi, to clarify your issues:

  1. As mentioned, cross-database results are not great. I'm not sure what data were you evaluating on previously, but this could be the reason the pre-trained model only predicted label 4. For FERPlus, it is the same cause, model trained on RAF-DB is not supposed to reach SOTA on FERPlus. The model was uploaded as proof for evaluation on the RAF-DB testing set that we achieved the performance reported in the paper.
  2. The notebook uploaded recently has nothing to do with the REMOVED pre-trained model. It is for your training on your dataset (FER2013, FERPlus, or your dataset) and to evaluate the trained model afterward. Hence, if you wanted to get the reported FERPlus results, you will have to download the dataset and train with the notebook/the model architecture in the notebook, before evaluating on FERPlus.

Hope this helps.

HeeminYang commented 5 months ago

@JLREx I now understand. The model you uploaded is related to RAF-DB and has nothing to do with FER+, right? However, on the paperswithcode website, your PAtt-Lite is registered as achieving SOTA on FER+ with an accuracy of 95.550. It seems that many users are raising questions about the low performance of the model because of this. It seems that a clear notice is necessary, and do you have any plans to release the model that achieved SOTA registered on the paperswithcode website?

JLREx commented 5 months ago

@HeeminYang Yes, the model trained on RAF-DB has nothing to do with FERPlus. The results registered on paperswithcode is a model trained and evaluated on FERPlus. For now, we do not have any plan to release the model as it could potentially cause further confusion. You may train your model with the notebook shared in this repository. From our runs on Kaggle, a training on FERPlus should be around or less than 2 hours.

MyungBeomHer commented 5 months ago

@JLREx The model inputs 2D tensor to "tf.keras.layers.Attention". More specifically, the model use many conv2d layer, and then output is 4D tensor [B,C,H,W], but tf.keras.layers.GlobalAveragePooling2D's input is 4D tensor and output is 2D tensor. So input of attention layer is 2D tensor. Finally, attention layer needs only 3D tensor[B,N,E] where B=Batch, N=number of patches, E=Embedding dimension. But this model input 2D tensor[B,E] to attention layer. Thus, Attention layer think this input has only one sample, not batch. So they convert [B,E] to [1,B,E] and run the attention operation (dot product). Therefore, when this model inference, they are not dot product about only one sample, dot product among the other samples in validation dataset. If the different label validation data within a batch exists during inference, their accuracy decreases rapidly.(originally, same label is clustered in validation dataset.ex) label1 ... label1, label2,...label2,... label n ,... , label n) Therefore, it cannot be said that the performance is correct,

cyinen commented 5 months ago

@heomyeongboem Hello! May I ask where you downloaded the pre-trained model? Can anyone give a link?

MyungBeomHer commented 5 months ago

@cyinen Hello. I don't have the pre-trained model. But when I see the patte-lite code, they use conv2d [B,C,H,W] -> GAP [B,C]-> linear [B, C'] -> attention [B,C'] -> linear for classification. So I thought their code have problem.

tf/keras/layers/GlobalAveragePooling2D https://www.tensorflow.org/api_docs/python/tf/keras/layers/GlobalAveragePooling2D

tf/keras/layers/Attention https://www.tensorflow.org/api_docs/python/tf/keras/layers/Attention

LokmaneZ commented 3 months ago

Hello, I am looking for the pretrained model, does anyone have link to get it? Thank you in advance

JLREx commented 2 months ago

@JLREx The model inputs 2D tensor to "tf.keras.layers.Attention". More specifically, the model use many conv2d layer, and then output is 4D tensor [B,C,H,W], but tf.keras.layers.GlobalAveragePooling2D's input is 4D tensor and output is 2D tensor. So input of attention layer is 2D tensor. Finally, attention layer needs only 3D tensor[B,N,E] where B=Batch, N=number of patches, E=Embedding dimension. But this model input 2D tensor[B,E] to attention layer. Thus, Attention layer think this input has only one sample, not batch. So they convert [B,E] to [1,B,E] and run the attention operation (dot product). Therefore, when this model inference, they are not dot product about only one sample, dot product among the other samples in validation dataset. If the different label validation data within a batch exists during inference, their accuracy decreases rapidly.(originally, same label is clustered in validation dataset.ex) label1 ... label1, label2,...label2,... label n ,... , label n) Therefore, it cannot be said that the performance is correct,

Hi. Thanks a lot for pointing it out. We overlooked on how TensorFlow treats and assumes its input. We are working on it and will try to update the new model and codes here in the coming weeks.

userguazi commented 2 months ago

您好,我正在寻找预训练模型,怎么才能找到啊

JLREx commented 2 months ago

@cyinen @LokmaneZ @userguazi Hi, the pretrained model is removed due to the cross-database performance as mentioned in earlier replies. You should train your model with the notebook provided instead.

KrystianZielinski commented 4 days ago

Hey. I am trying to evaluate Patt-lite using 5-fold CV - so I trained the Patt-lite model on FER+ on my own folds (different train/val/test sets with 80/10/10 % ratios). What's weird to me is that in the first fold i got around 80% accuracy on train set, around 72% on validation set, but on test set I got... ~95%. And then I train the model again on 2nd fold - I get around same accuracies on train/validation sets, but this time accuracy on test set is only ~33%. So I wanted to ask if you got similar results? And what could be the cause that one time the test accuracy is much higher than on train/val sets and the next time its way lower? Just model not being stable?

KrystianZielinski commented 2 days ago

@JLREx The model inputs 2D tensor to "tf.keras.layers.Attention". More specifically, the model use many conv2d layer, and then output is 4D tensor [B,C,H,W], but tf.keras.layers.GlobalAveragePooling2D's input is 4D tensor and output is 2D tensor. So input of attention layer is 2D tensor. Finally, attention layer needs only 3D tensor[B,N,E] where B=Batch, N=number of patches, E=Embedding dimension. But this model input 2D tensor[B,E] to attention layer. Thus, Attention layer think this input has only one sample, not batch. So they convert [B,E] to [1,B,E] and run the attention operation (dot product). Therefore, when this model inference, they are not dot product about only one sample, dot product among the other samples in validation dataset. If the different label validation data within a batch exists during inference, their accuracy decreases rapidly.(originally, same label is clustered in validation dataset.ex) label1 ... label1, label2,...label2,... label n ,... , label n) Therefore, it cannot be said that the performance is correct,

Hey. I have tried to replace the keras Attention layer with Attention from maximal library. I am not 100% sure if it fixes the issue that you're describing, but I think the maximal Attention calculates the dot product for each vector from FC layer individually? Could you please take a look at the attention function code from maximal library? The code is available on git hub: https://github.com/IvanBongiorni/maximal/blob/main/maximal/layers.py