Code (scripts, notebooks) for training, testing & verification

RobertCrash commented 4 months ago

Hi.

I really like your approach and model (architecture, method). Thanks for sharing it, in particular pre-trained.

Could you provide me some code (scripts, notebooks, snippets) for training, testing & verification, especially including parsing and preprocessing the datasets.

My main goal is to extend the model for Facial Emotion Recognition (FER) / emotional attributes analysis, trained on FER2013, FER+.

Thanks in advance. Best regards. Robert

Kartik-3004 commented 4 months ago

Hi Robert, Nice to know you liked our work. Our work is currently under review and we cannot make our training, testing and verification scripts public. We plan to share it as soon as our work gets accepted. You can star the repo and stay tuned for the updates !

karaposu commented 3 months ago

@RobertCrash I ended up creating another implementation, check it out https://github.com/karaposu/facexformer-pipeline

I am also considering adding some features once training codes are shared. Maybe we can collaborate on that

RobertCrash commented 3 months ago

@Kartik-3004 Hi. Thanks for letting me know. In my experience, it takes an average of 6 months to submit and review a paper. Maybe you can share some code snippets in advance (under the table) ?

At least some details about alignment (padding) and cropping - preprocessing in general -, that were actually applied ???

RobertCrash commented 3 months ago

@karaposu Hi. Sure, i would appreciate that we can work together on this!

And I already checked out your repository (facexformer-pipeline) a few weeks ago.

I think we'll have to wait a few more months before more code and details are released (see my previous post).

Were you able to reverse engineer the actual image preprocessing - alignment (padding), cropping, scaling, etc. -, in particular have you tried e.g. keep aspect-ratio and fill remaining pixels with zeros/ones (white/black)?

-:) i saw you played around with !

karaposu commented 3 months ago

@RobertCrash yup. i tested it for faceparsing, landmarks, headpose tasks. Now all padding scaling etc is internally handled.

RobertCrash commented 3 months ago

@karaposu OK. i just checked/reviewed your code in too much of a hurry. But i just found mainly margin/padding adjustments and you also use the same 'transforms_image'.

That you add some features and do some code refactoring is nice, but "Now all padding scaling etc is internally handled" does not really answer my question.

Have you tried keeping the aspect-ratio and fill remaining pixels with zeros/ones (white/black), e.g. for 224x224 ?

karaposu commented 3 months ago

@RobertCrash you can check the notebook and see that now it is possible to upload a non-portrait image of a person and get landmarks on original size image(arbitrary size). This also works for faceparsing too. Now all padding scaling etc is internally handled and you dont need to worry about it. Just your feed the model an image and get the outputs from the model with no further post processing needed.

RobertCrash commented 3 months ago

@karaposu We're probably talking past each other. Using a non-portrait image of a person is also possible, up to a certain extent, with this implementation (https://github.com/Kartik-3004/facexformer), because of "mtcnn = MTCNN(keep_all=True)".

And that in a final, production-ready inference the right pre-processing (scaling, padding/margin, alignment, etc. ) is included, Needless to say. Or with your words "Now all padding scaling etc is internally handled and you dont need to worry about it". Same applies to post-processing.

So again:

Have you tried keeping the aspect-ratio and fill remaining pixels with zeros/ones (white/black), e.g. for 224x224 ?
Is the accuracy with your implementation, or more precisely with image pre-processing (scaling, padding/margin, alignment, etc.) your are using, as expected, resp. claimed by the authors of the faceXfromer paper?

Regards

karaposu commented 3 months ago

MTCNN keep_all param has nothing to do with scaling or cropping. And no original implementation does not remap to original size.
They take an image, and transform it to 224x224 and feed it to the model. Thats all. Remapping part is not included afaik.

i dont understand your question and english at all, instead of repeating it you can elaborate on what you mean and we can have a better convo.

RobertCrash commented 3 months ago

Iam not a native english speaker, but iam sure my english is not that bad!!! And yet we still talk past each other. For sure!

Of course and you are right, Remapping part is NOT part of o original implementation. and as i said "you add some features", e.g. like Remapping.

-1: using a non-portrait image of a person is possible, up to a certain extent, with original implementation, because of mtcnn = MTCNN(keep_all=True) image = Image.open(args.image_path) width, height = image.size boxes, probs = mtcnn.detect(image)

--> face detection is performed

-2: "They take an image, and transform it to 224x224 and feed it to the model." As already said: yes you are doing some more processing, mainly margin/padding adjustments. but at the end your also transform it to 224x224 and feed it to the model !!! refer to 'transforms_image' !!!

"instead of repeating it you can elaborate on what you mean and we can have a better convo.". Maybe! I asked 2 specific questions that you haven't even addressed yet !

Cheers

RobertCrash commented 3 months ago

*refer to 'transforms_image' -> (correction) referred to 'transforms_image' , means: in reference to 'transforms_image'

regarding to your code:

def calculate_head_ROI(self,image, fd_coordinates):

        bottom_margin_ratio = 0.30
        top_margin_ratio = 0.30
        left_margin_ratio = 0.30
        right_margin_ratio = 0.30

        head_ROI_coordinates=self.find_head_ROI_coordinates( fd_coordinates,
                                                    bottom_margin_ratio,
                                                    top_margin_ratio,
                                                    left_margin_ratio,
                                                    right_margin_ratio)

        head_ROI = self.crop_rect_ROI_from_Img(image, head_ROI_coordinates)

        return  head_ROI, head_ROI_coordinates

def crop_rect_ROI_from_Img(self, img, coordinates):
       ...

def find_head_ROI_coordinates(self,
                               face_coordinates,
                               chin_extension_ratio,
                               top_margin_ratio,
                               left_margin_ratio,
                               right_margin_ratio):
      ...

The answer of my 1ft. question is NO? Because this code just varying (resp. change/modify) margins/paddings and cropping the region of interest (ROI) from an image based on face detection coordinates.

Have fun !

karaposu commented 3 months ago

*refer to 'transforms_image' -> (correction) referred to 'transforms_image' , means: in reference to 'transforms_image'

regarding to your code:

def calculate_head_ROI(self,image, fd_coordinates):

        bottom_margin_ratio = 0.30
        top_margin_ratio = 0.30
        left_margin_ratio = 0.30
        right_margin_ratio = 0.30

        head_ROI_coordinates=self.find_head_ROI_coordinates( fd_coordinates,
                                                    bottom_margin_ratio,
                                                    top_margin_ratio,
                                                    left_margin_ratio,
                                                    right_margin_ratio)

        head_ROI = self.crop_rect_ROI_from_Img(image, head_ROI_coordinates)

        return  head_ROI, head_ROI_coordinates

def crop_rect_ROI_from_Img(self, img, coordinates):
       ...

def find_head_ROI_coordinates(self,
                               face_coordinates,
                               chin_extension_ratio,
                               top_margin_ratio,
                               left_margin_ratio,
                               right_margin_ratio):
      ...

The answer of my 1ft. question is NO? Because this code just varying (resp. change/modify) margins/paddings and cropping the region of interest (ROI) from an image based on face detection coordinates.

Have fun !

yes and then remaps this ROI to original image.

karaposu commented 3 months ago

@RobertCrash "As already said: yes you are doing some more processing, mainly margin/padding adjustments. but at the end your also transform it to 224x224 and feed it to the model !!! refer to 'transforms_image' !!!"

yes of course, model is prepared to take input 224x224 and we must follow it. In the end i do remapping to the original values with such functions :

place_mask_in_original_image(self,original_image, head_mask, face_coords)
convert_local_to_global(self, local_landmarks, roi_start_point)

Maybe you assumed i changed the model structure to allow it to take other sizes as input?

RobertCrash commented 3 months ago

Maybe you assumed i changed the model structure to allow it to take other sizes as input?: NO

ATM i DONT care about any post-processing features, like remapping, you have added.

My initial question was: "Were you able to reverse engineer the actual image preprocessing?" Why: Kartik-3004 mentioned they have NOT released actual (originally used) image preprocessing (alignment, padding/margin) jet!

Problem: Without the actual (originally used) image preprocessing, the accuracy of the results are NOT as as expected, respectively claimed by the authors of the faceXfromer paper.

(it follows) --> 1- The model is prepared, resp. trained, to take input 224x224: probably 2- image preprocessing, especially in ML, could include more than just varying/changing the margins (and maybe the alignment), besides normalization, vectorization (here with scaling/resizing to 224x224), etc.

Example: You "just" change the margins and and then you apply this

transforms_image = torchvision.transforms.Compose([
                torchvision.transforms.Resize(size=(224,224), interpolation=InterpolationMode.BICUBIC),
                torchvision.transforms.ToTensor(),
                torchvision.transforms.Normalize(mean=[0.485, 0.456, 0.406], std=[0.229, 0.224, 0.225]),
            ])

an other usual approach, keep the aspect ratio:

something like this (written on the fly)

# Define the target size
target_size = 224

# Load your image
img = Image.open('your_image.jpg')

# Calculate the size to resize while keeping the aspect ratio
aspect_ratio = img.width / img.height
if aspect_ratio > 1:
    new_size = (target_size, int(target_size / aspect_ratio))
else:
    new_size = (int(target_size * aspect_ratio), target_size)

# Define the transformation
transform = transforms.Compose([
    transforms.Resize(new_size, interpolation=Image.BILINEAR),  # Resize while keeping aspect ratio
    transforms.Pad(((target_size - new_size[0]) // 2,
                    (target_size - new_size[1]) // 2,
                    (target_size - new_size[0]) - (target_size - new_size[0]) // 2,
                    (target_size - new_size[1]) - (target_size - new_size[1]) // 2), fill=0)  # Pad to target size
])

# Apply the transformation
transformed_img = transform(img)

# Convert to tensor (optional, if you need a tensor for further processing)
transformed_tensor = transforms.ToTensor()(transformed_img)

# Normalize
...

So again:

1- Have you tried keeping the aspect-ratio and fill remaining pixels with zeros/ones (white/black), e.g. for 224x224 ? 1- Is the accuracy with your implementation, or more precisely with image pre-processing (scaling, padding/margin, alignment, etc.) your are using, as expected, resp. claimed by the authors of the faceXfromer paper?

Regards

Kartik-3004 / facexformer

Code (scripts, notebooks) for training, testing & verification #16