abhi1kumar / LUVLi

[CVPR 2020] Re-hosting of the LUVLi Face Alignment codebase. Please download the codebase from the original MERL website by agreeing to all terms and conditions. By using this code, you agree to MERL's research-only licensing terms.
https://github.com/merlresearch/LUVLi
37 stars 2 forks source link

Running Inference #3

Closed waleedrazakhan92 closed 2 years ago

waleedrazakhan92 commented 2 years ago

Hi, thanks for the great work. I've downloaded the zip file with the codes and tried running but there seem to be no clear cut indication on how to run so can you please state the requirements for running the inference. I'm choosing to run this code as i need the visibility values for each landmark. Any help regarding running the inference would be highly helpful.

abhi1kumar commented 2 years ago

Hi @waleedrazakhan92 I suppose you have downloaded the code from the MERL website. There is a README.md inside the downloaded zip file which contains all the instructions to create the conda environment for inference. Specifically, the instructions for the conda environment as mentioned in this README are

conda env create --file conda_py27.yml
conda activate py27

The models are available on the second README

Once you have setup the datasets and the models in the respective folders, please run

./scripts_evaluation.sh 

to reproduce all our results.

waleedrazakhan92 commented 2 years ago

Hi @waleedrazakhan92 I suppose you have downloaded the code from the MERL website. There is a README.md inside the downloaded zip file which contains all the instructions to create the conda environment for inference. Specifically, the instructions for the conda environment as mentioned in this README are

conda env create --file conda_py27.yml
conda activate py27

The models are available on the second README

Once you have setup the datasets and the models in the respective folders, please run

./scripts_evaluation.sh 

to reproduce all our results.

Thankyou @abhi1kumar for the reply. I'll follow the instructions.

waleedrazakhan92 commented 2 years ago

@abhi1kumar One question: Do you have a version that's compatible with python 3?

abhi1kumar commented 2 years ago

Do you have a version that's compatible with python 3?

Unfortunately, no. Our backbone is the DU-Net or CU-Net whose implementation is based on Python 2.7 . Therefore, our codebase also uses Python 2.7

waleedrazakhan92 commented 2 years ago

Thankyou @abhi1kumar for the answer. I've followed the readme file and setup the environment. The evaluation script runs fine and the results and plots are produced, but I'm wondering how do I infer on the custom images. Because the evaluation code is setup for different datasets and not for the custom images. Can you guide me what do i need besides the test images to run the inference? What I want to do is to have landmark locations along with their corresponding visibility values. Looking forward to your reply.

abhi1kumar commented 2 years ago

The evaluation script runs fine and the results and plots are produced,

I am indeed happy that you could reproduce our results on your end.

how do I infer on the custom images. Because the evaluation code is setup for different datasets and not for the custom images. Can you guide me what do i need besides the test images to run the inference? What I want to do is to have landmark locations along with their corresponding visibility values.

This would take a little extra effort on your end but is nonetheless possible. I did this nearly two years ago for the demo. We need to get the JSONs for running it over arbitrary images. Here are the steps:

  1. Get the coordinates of rectangular box which contains faces in your images. The crop coordinates should not be necessarily accurate but a rough idea of where the face is in the image works fine. We need the face detection since the input to the LUVLi model is the faces and not the whole images (which might contain everything from trees to hats). LUVLi estimates location, uncertainties and visibilities of landmarks on the faces after face detection is done. The face detection coordinates could be the first two line of the dummy_landmark_pts.zip file, described in the next step.

  2. Create copies of dummy labels for each image so that the dataloader face_bbx.py does not complain. The dummy_landmark_pts I used for creating the demo is attached here.

  3. The next step is to obtain the JSON by running the JSON creator script. The instructions are in the bottom of README of the code you downloaded from the MERL Website. We need to create another splits_prep/config_custom_images.txt first similar to splits_prep/config_aflw_ours_all.txt. In splits_prep/config_custom_images.txt,

    • Change the paths if required.
    • Make sure that the val_datasets_names is aflw_ours (the old name of MERL-RAV) for each of your images.

Then, run the splits_prep/get_jsons_from_config.py with the new config argument.

python splits_prep/get_jsons_from_config.py -i splits_prep/config_custom_images.txt

This should give you a JSON say custom_images.json inside the dataset directory.

  1. Once you have the custom_images.json, run inference over the MERL-RAV (AFLW_ours) model on custom_images.json using the following command:
python validate_and_forward_pass.py --exp_dir abhinav_model_dir/ --exp_id run_5004_custom_eval \
--saved_wt_file abhinav_model_dir/run_5004/lr-0.00002-49.pth.tar \
--pp "relu" --laplacian --use_visibility --bs 12 --gpu_id 1 --val_json dataset/custom_images.json

This should produce the model predictions on your custom images. I believe you should get results similar to our demo on your end.

waleedrazakhan92 commented 2 years ago

The evaluation script runs fine and the results and plots are produced,

I am indeed happy that you could reproduce our results on your end.

how do I infer on the custom images. Because the evaluation code is setup for different datasets and not for the custom images. Can you guide me what do i need besides the test images to run the inference? What I want to do is to have landmark locations along with their corresponding visibility values.

This would take a little extra effort on your end but is nonetheless possible. I did this nearly two years ago for the demo. We need to get the JSONs for running it over arbitrary images. Here are the steps:

  1. Get the coordinates of rectangular box which contains faces in your images. The crop coordinates should not be necessarily accurate but a rough idea of where the face is in the image works fine. We need the face detection since the input to the LUVLi model is the faces and not the whole images (which might contain everything from trees to hats). LUVLi estimates location, uncertainties and visibilities of landmarks on the faces after face detection is done. The face detection coordinates could be the first two line of the dummy_landmark_pts.zip file, described in the next step.
  2. Create copies of dummy labels for each image so that the dataloader face_bbx.py does not complain. The dummy_landmark_pts I used for creating the demo is attached here.
  3. The next step is to obtain the JSON by running the JSON creator script. The instructions are in the bottom of README of the code you downloaded from the MERL Website. We need to create another splits_prep/config_custom_images.txt first similar to splits_prep/config_aflw_ours_all.txt. In splits_prep/config_custom_images.txt,
  • Change the paths if required.
  • Make sure that the val_datasets_names is aflw_ours (the old name of MERL-RAV) for each of your images.

Then, run the splits_prep/get_jsons_from_config.py with the new config argument.

python splits_prep/get_jsons_from_config.py -i splits_prep/config_custom_images.txt

This should give you a JSON say custom_images.json inside the dataset directory.

  1. Once you have the custom_images.json, run inference over the MERL-RAV (AFLW_ours) model on custom_images.json using the following command:
python validate_and_forward_pass.py --exp_dir abhinav_model_dir/ --exp_id run_5004_custom_eval \
--saved_wt_file abhinav_model_dir/run_5004/lr-0.00002-49.pth.tar \
--pp "relu" --laplacian --use_visibility --bs 12 --gpu_id 1 --val_json dataset/custom_images.json

This should produce the model predictions on your custom images. I believe you should get results similar to our demo on your end.

Thankyou @abhi1kumar for the detailed reply. I'll follow these steps to make it run on my custom images.

waleedrazakhan92 commented 2 years ago

Hi @abhi1kumar I've managed to run inference on custom images. But the results are not what i expected. These are the steps that i followed: dataset = 2 images 1) created two .pts files for two images using the format that you shared. What i didn't understand is the format of the points. As you mentioned that the model expects the cropped face, so first two points are the rectangular coordinates of the file. What format should they be in? Should the first point be x_min,y_min the second point be x_max, ymax? or is it some other format? also what values should the rest of the points be? for now like the 'dummy_pts' file I've set the rest as 500 700 2) I then produced the json file for these two images using the get_jsons_from_config.py file but modifying it slightly so it only produces the val_json files since i'm only infering. 3) then i ran the validate_and_forward_pass.py file to get the inference on my data. I used face-layer-num-8-order-1-model-best.pth.tar as the saved_wt_file as when i use the lr-0.00002-49.pth.tar checkpoint file it gives me the error: File "validate_and_forward_pass.py", line 107, in main net.state_dict()[name].copy_(param) RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:101 4) used the show_dataset_images_overlaid_with_uncertainties.py to plot the landmarks.

The code runs file and produces the results but the results are very bad. To debug, i played with the .pts file for the images to see what the values should be and copied the same image twice but set the first two points to different values. And the results are indeed different for the same image with different pts files. I'm attaching the images. Please guide me on what else should i look to to correct the results. image image_2

abhi1kumar commented 2 years ago

Created two .pts files for two images using the format that you shared. What i didn't understand is the format of the points.

The format is described here

As you mentioned that the model expects the cropped face, so first two points are the rectangular coordinates of the file. What format should they be in? Should the first point be x_min,y_min the second point be x_max, ymax? or is it some other format? also what values should the rest of the points be? for now like the 'dummy_pts' file I've set the rest as 500 700

Yes, the first point be x_min, y_min and second point be x_max and y_max. The cropping code inside face_bbx.py takes the tightest box containing all points and therefore, the rest of the points can be anything between x_min and x_max for the x_coordinate and between y_min and y_max for the y coordinate.

As an example, if you choose x_min = 100 x_max = 500, the rest of the points can be 300 Similarly, if you choose y_min = 100, y_max = 700, the rest of the points can be 400

I then produced the json file for these two images using the get_jsons_from_config.py file but modifying it slightly so it only produces the val_json files since i'm only infering.

That is perfectly fine.

Then i ran the validate_and_forward_pass.py file to get the inference on my data. I used face-layer-num-8-order-1-model-best.pth.tar as the saved_wt_file as when i use the lr-0.00002-49.pth.tar checkpoint file it gives me the error: File "validate_and_forward_pass.py", line 107, in main net.state_dict()[name].copy_(param) RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:101

face-layer-num-8-order-1-model-best.pth.tar is the DU-Net model trained on 300-W Split 1 and does not give uncertainties or visibilities. So, you should not use this model.

I next list out the command we use for evaluating AFLW_ours model on AFLW_ours val splits.

python evaluate_face_all.py --exp_id run_5004 -s 5 --pp "relu" --bs 12 --laplacian

evaluate_face_all.py is a wrapper for multiple (all) datasets which calls validate_and_forward_pass.py. With evaluation over aflw_ours_all_val.json, it prints out the following command:

python validate_and_forward_pass.py --gpu_id 0 --exp_id run_5004_evaluate/aflw_ours_all \
--val_json dataset/aflw_ours_all_val.json \
--class_num 68 --save_image_heatmaps \
--saved_wt_file abhinav_model_dir/run_5004/lr-0.00002-49.pth.tar \ 
--layer_num 8 --pp relu --laplacian  --use_visibility  --hg_wt 0,0,0,0,0,0,0,1  --wt_gau 1.0 --wt_mse 0.0 \
--bs 12 --mlp_tot_layers 1 --mlp_hidden_units 4096

I change the --val_json argument for this command with our new json. If you could reproduce our numbers for the AFLW_ours dataset, the following command should also run correctly.

python validate_and_forward_pass.py --gpu_id 0  --exp_dir abhinav_model_dir/ --exp_id run_5004_custom_eval \ 
--val_json dataset/custom_images.json \
--class_num 68 --save_image_heatmaps
--saved_wt_file abhinav_model_dir/run_5004/lr-0.00002-49.pth.tar \
--layer_num 8 --pp "relu" --laplacian --use_visibility --hg_wt 0,0,0,0,0,0,0,1  --wt_gau 1.0 --wt_mse 0.0 \
--bs 12 --mlp_tot_layers 1 --mlp_hidden_units 4096

used the show_dataset_images_overlaid_with_uncertainties.py to plot the landmarks.

This is also correct. You should also see plot/make_video_by_images_overlay_with_uncertainties.py

The code runs file and produces the results but the results are very bad. To debug, i played with the .pts file for the images to see what the values should be and copied the same image twice but set the first two points to different values. And the results are indeed different for the same image with different pts files.

The results are output for the cropped images which are different because you cropped it differently.

I'm attaching the images. Please guide me on what else should I look to to correct the results.

I assume that the green points refer to the boundary points (x_min, x_max, y_min, y_max) of your face detector. Based on your green points, I see that your boundary coordinates of the face detector is not correct. I am marking the correct boundary coordinates of your face detector in orange points and its boundary as orange box. Make sure you use the arithmetic mean of these two boundary points as the other landmark points.

sample_edited

waleedrazakhan92 commented 2 years ago

Hi @abhi1kumar thankyou for pointing out the mistake. I corrected the landmark points and the model now works fine. But there are a few difficulties that i have. There are the steps that i took: 1) Change the .pts files for every image so the x_min,y_min,x_max,y_max values are correct. Took the mean of the values and set the rest of the values as the mean. 2) Also at first the pts files only contain 68 points now i put 98 landmark points in each .pts file set the --class_num flag to 98 when running the validate_and_forward_pass.py file so now the model lr-0.00002-49.pth.tar file working and outputting the results. 3) generated the plots so now the plots are good.

What I'm confused about is the results still don't show any occlusion and prints all the landmarks even of smme of them are occluded. I tried debugging the plot file CommonPlottingOperations.py and noticed that there is a thereshold value set to 0.65 which tells the function to plot the landmarks for which the vis_estimated is greater than 0.65 but for each of my image every landmark has vis_estimate value greater than 0.9 which shouldn't be the case since there is an image with the glasses on and another image with the side on view. I'm attaching the produced image files. Please let me know what am i missing here. I'm looking forward to you guidance: image image_opaque image_side image_sunglasses

abhi1kumar commented 2 years ago

Also at first the pts files only contain 68 points now i put 98 landmark points in each .pts file set the --class_num flag to 98 when running the validate_and_forward_pass.py file so now the model lr-0.00002-49.pth.tar file working and outputting the results.

You are using the WFLW model (which outputs 98 points) for inference and not MERL-RAV (AFLW_ours/ run_5004) model which outputs 68 points per face. The image also confirms this since the landmark points for the profile faces are on the boundary.
image_side Please make sure you use the correct model. The models trained on datasets other than MERL-RAV will NOT output visibility.

What I'm confused about is the results still don't show any occlusion and prints all the landmarks even of smme of them are occluded. Please let me know what am i missing here. I'm looking forward to you guidance:

The models trained on datasets other than MERL-RAV will NOT output visibility.

waleedrazakhan92 commented 2 years ago

Thankyou @abhi1kumar so much. This was indeed the case the model had same name so i was using the one trained on wflw data. I've infered in the merl_rav checkpoint and outputs correctly image image_external image_opaque image_sunglasses image_side

abhi1kumar commented 2 years ago

Thankyou @abhi1kumar so much. This was indeed the case the model had same name so i was using the one trained on wflw data.

Welcome. The checkpoints name are same since all are finetuned for 50 epochs. However, they are different models.

I've infered in the merl_rav checkpoint and outputs correctly

Perfect.

@waleedrazakhan92 If your issues are fixed, would you mind closing this issue and starring the LUVLi github repo?

waleedrazakhan92 commented 2 years ago

sure @abhi1kumar thankyou for guiding me through

waleedrazakhan92 commented 2 years ago

Hi @abhi1kumar is there an efficient way to get the full image with the landmarks as an output and not just the cropped image? I've tried recopying the cropped to the original image but i believe the dataloader first resizes them to 256 and then inputs into the model. So the output crops with the landmarks are 256x256 in shape. And all my images with landmarks are being save with 746x746 resolution. So cropping back by using the initial 2 coordinates of the .pts files isn't possible. Is there a workaround as i need the whole image with the landmarks on it?

I've tried setting the scale_mul_factor value in the face_bbx.py to 1 but now the images are being save with 709x709 resolution and seem to have white boundaries added to them. Image with scale 1.1 after plot saving : resolution 746x746 test-control Image with scale 1.0 after plot saving : resolution 709x746 test-control

Also the visibility values seem to be very high for most of the landmarks even if they're occluded, like with glasses. I've set the landmark plotting threshold to 0.9 but still almost all of the landmarks get through. You can see the above picture also has an external object occluding the face and in the below face there is glasses. Shouldn't the visibility values be lower than 0.99: test-fail (7)

Please let me know what can i do to resolve these problems. I'm looking forward to your reply

abhi1kumar commented 2 years ago

Hi @abhi1kumar is there an efficient way to get the full image with the landmarks as an output and not just the cropped image? I've tried recopying the cropped to the original image but i believe the dataloader first resizes them to 256 and then inputs into the model. So the output crops with the landmarks are 256x256 in shape.

I've tried setting the scale_mul_factor value in the face_bbx.py to 1 but now the images are being save with 709x709 resolution and seem to have white boundaries added to them.

Unfortunately, I did not try mapping the landmarks on cropped images to the original images. We kept the dataloaders and other processing same as the original DU-Net authors. You might have to look into the pylib folder to see if DU-Net authors provide a function to map the points in cropped image space to the full image space. If there is no such function, then you might have to write one.

Also the visibility values seem to be very high for most of the landmarks even if they're occluded, like with glasses. I've set the landmark plotting threshold to 0.9 but still almost all of the landmarks get through. You can see the above picture also has an external object occluding the face and in the below face there is glasses. Shouldn't the visibility values be lower than 0.99

I think you mis-understood visibility. As mentioned in Section 4 of our paper, there are 2 categories of visibility - invisible and visible, while in general landmarks belong to either of the 3 categories - unoccluded, external occluded (by glasses, hair) and self-occluded (because of the change in profile). For modelling with mixed random variables, invisible (v_j=0) is equivalent to the self-occluded landmarks, while visible (v_j=1) refers to both unoccluded and externally occluded landmarks.

The example that you show are the frontal faces and therefore, both unoccluded and external occluded landmarks (such as glasses) have visibility (v_j=1). Hence, the visibility for these landmarks should be a value close to 1 (which is what you are seeing). However, you should easily notice that uncertainty ellipses are smaller for unoccluded (such as nose and lips) landmarks, while they are higher for external occluded landmarks (such as eyes behind glasses).

waleedrazakhan92 commented 2 years ago

I think you mis-understood visibility. As mentioned in Section 4 of our paper, there are 2 categories of visibility - invisible and visible, while in general landmarks belong to either of the 3 categories - unoccluded, external occluded (by glasses, hair) and self-occluded (because of the change in profile). For modelling with mixed random variables, invisible (v_j=0) is equivalent to the self-occluded landmarks, while visible (v_j=1) refers to both unoccluded and externally occluded landmarks.

Thankyou @abhi1kumar for addressing and clearing both my queries.

The example that you show are the frontal faces and therefore, both unoccluded and external occluded landmarks (such as glasses) have visibility (v_j=1). Hence, the visibility for these landmarks should be a value close to 1 (which is what you are seeing). However, you should easily notice that uncertainty ellipses are smaller for unoccluded (such as nose and lips) landmarks, while they are higher for external occluded landmarks (such as eyes behind glasses).

I did notice the size of the ellipses to be larger for the occluded parts. I'll put a check on the size to filter any unwanted landmarks.

I'll look into the DU-Net repository to see if i can find an inverse method. Thankyou again for clearing everything up.

Jourtics commented 1 year ago

Hey, @waleedrazakhan92 @abhi1kumar could you provide the code through which you were able to run the model for any arbitrary image?

abhi1kumar commented 1 year ago

Hi @Jourtics Thank you for your interest in LUVLi.

could you provide the code through which you were able to run the model for any arbitrary image?

I do not have a script with me. You could follow the steps above to generate predictions for arbitrary images. In case you prepare a script, feel free to raise a PR.