Closed waleedrazakhan92 closed 2 years ago
Hi @waleedrazakhan92 I suppose you have downloaded the code from the MERL website. There is a README.md inside the downloaded zip file which contains all the instructions to create the conda environment for inference. Specifically, the instructions for the conda environment as mentioned in this README are
conda env create --file conda_py27.yml
conda activate py27
The models are available on the second README
Once you have setup the datasets and the models in the respective folders, please run
./scripts_evaluation.sh
to reproduce all our results.
Hi @waleedrazakhan92 I suppose you have downloaded the code from the MERL website. There is a README.md inside the downloaded zip file which contains all the instructions to create the conda environment for inference. Specifically, the instructions for the conda environment as mentioned in this README are
conda env create --file conda_py27.yml conda activate py27
The models are available on the second README
Once you have setup the datasets and the models in the respective folders, please run
./scripts_evaluation.sh
to reproduce all our results.
Thankyou @abhi1kumar for the reply. I'll follow the instructions.
@abhi1kumar One question: Do you have a version that's compatible with python 3?
Do you have a version that's compatible with python 3?
Unfortunately, no. Our backbone is the DU-Net or CU-Net whose implementation is based on Python 2.7 . Therefore, our codebase also uses Python 2.7
Thankyou @abhi1kumar for the answer. I've followed the readme file and setup the environment. The evaluation script runs fine and the results and plots are produced, but I'm wondering how do I infer on the custom images. Because the evaluation code is setup for different datasets and not for the custom images. Can you guide me what do i need besides the test images to run the inference? What I want to do is to have landmark locations along with their corresponding visibility values. Looking forward to your reply.
The evaluation script runs fine and the results and plots are produced,
I am indeed happy that you could reproduce our results on your end.
how do I infer on the custom images. Because the evaluation code is setup for different datasets and not for the custom images. Can you guide me what do i need besides the test images to run the inference? What I want to do is to have landmark locations along with their corresponding visibility values.
This would take a little extra effort on your end but is nonetheless possible. I did this nearly two years ago for the demo. We need to get the JSONs for running it over arbitrary images. Here are the steps:
Get the coordinates of rectangular box which contains faces in your images. The crop coordinates should not be necessarily accurate but a rough idea of where the face is in the image works fine. We need the face detection since the input to the LUVLi model is the faces and not the whole images (which might contain everything from trees to hats). LUVLi estimates location, uncertainties and visibilities of landmarks on the faces after face detection is done. The face detection coordinates could be the first two line of the dummy_landmark_pts.zip file, described in the next step.
Create copies of dummy labels for each image so that the dataloader face_bbx.py does not complain. The dummy_landmark_pts I used for creating the demo is attached here.
The next step is to obtain the JSON by running the JSON creator script. The instructions are in the bottom of README of the code you downloaded from the MERL Website. We need to create another splits_prep/config_custom_images.txt
first similar to splits_prep/config_aflw_ours_all.txt. In splits_prep/config_custom_images.txt
,
val_datasets_names
is aflw_ours
(the old name of MERL-RAV) for each of your images. Then, run the splits_prep/get_jsons_from_config.py
with the new config argument.
python splits_prep/get_jsons_from_config.py -i splits_prep/config_custom_images.txt
This should give you a JSON say custom_images.json
inside the dataset
directory.
custom_images.json
, run inference over the MERL-RAV (AFLW_ours) model on custom_images.json
using the following command:python validate_and_forward_pass.py --exp_dir abhinav_model_dir/ --exp_id run_5004_custom_eval \
--saved_wt_file abhinav_model_dir/run_5004/lr-0.00002-49.pth.tar \
--pp "relu" --laplacian --use_visibility --bs 12 --gpu_id 1 --val_json dataset/custom_images.json
This should produce the model predictions on your custom images. I believe you should get results similar to our demo on your end.
The evaluation script runs fine and the results and plots are produced,
I am indeed happy that you could reproduce our results on your end.
how do I infer on the custom images. Because the evaluation code is setup for different datasets and not for the custom images. Can you guide me what do i need besides the test images to run the inference? What I want to do is to have landmark locations along with their corresponding visibility values.
This would take a little extra effort on your end but is nonetheless possible. I did this nearly two years ago for the demo. We need to get the JSONs for running it over arbitrary images. Here are the steps:
- Get the coordinates of rectangular box which contains faces in your images. The crop coordinates should not be necessarily accurate but a rough idea of where the face is in the image works fine. We need the face detection since the input to the LUVLi model is the faces and not the whole images (which might contain everything from trees to hats). LUVLi estimates location, uncertainties and visibilities of landmarks on the faces after face detection is done. The face detection coordinates could be the first two line of the dummy_landmark_pts.zip file, described in the next step.
- Create copies of dummy labels for each image so that the dataloader face_bbx.py does not complain. The dummy_landmark_pts I used for creating the demo is attached here.
- The next step is to obtain the JSON by running the JSON creator script. The instructions are in the bottom of README of the code you downloaded from the MERL Website. We need to create another
splits_prep/config_custom_images.txt
first similar to splits_prep/config_aflw_ours_all.txt. Insplits_prep/config_custom_images.txt
,
- Change the paths if required.
- Make sure that the
val_datasets_names
isaflw_ours
(the old name of MERL-RAV) for each of your images.Then, run the
splits_prep/get_jsons_from_config.py
with the new config argument.python splits_prep/get_jsons_from_config.py -i splits_prep/config_custom_images.txt
This should give you a JSON say
custom_images.json
inside thedataset
directory.
- Once you have the
custom_images.json
, run inference over the MERL-RAV (AFLW_ours) model oncustom_images.json
using the following command:python validate_and_forward_pass.py --exp_dir abhinav_model_dir/ --exp_id run_5004_custom_eval \ --saved_wt_file abhinav_model_dir/run_5004/lr-0.00002-49.pth.tar \ --pp "relu" --laplacian --use_visibility --bs 12 --gpu_id 1 --val_json dataset/custom_images.json
This should produce the model predictions on your custom images. I believe you should get results similar to our demo on your end.
Thankyou @abhi1kumar for the detailed reply. I'll follow these steps to make it run on my custom images.
Hi @abhi1kumar I've managed to run inference on custom images. But the results are not what i expected. These are the steps that i followed:
dataset = 2 images
1) created two .pts files for two images using the format that you shared. What i didn't understand is the format of the points. As you mentioned that the model expects the cropped face, so first two points are the rectangular coordinates of the file. What format should they be in? Should the first point be x_min,y_min the second point be x_max, ymax? or is it some other format? also what values should the rest of the points be? for now like the 'dummy_pts' file I've set the rest as 500 700
2) I then produced the json file for these two images using the get_jsons_from_config.py
file but modifying it slightly so it only produces the val_json files since i'm only infering.
3) then i ran the validate_and_forward_pass.py
file to get the inference on my data. I used face-layer-num-8-order-1-model-best.pth.tar
as the saved_wt_file as when i use the lr-0.00002-49.pth.tar
checkpoint file it gives me the error:
File "validate_and_forward_pass.py", line 107, in main net.state_dict()[name].copy_(param) RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:101
4) used the show_dataset_images_overlaid_with_uncertainties.py
to plot the landmarks.
The code runs file and produces the results but the results are very bad. To debug, i played with the .pts file for the images to see what the values should be and copied the same image twice but set the first two points to different values. And the results are indeed different for the same image with different pts files. I'm attaching the images. Please guide me on what else should i look to to correct the results.
Created two .pts files for two images using the format that you shared. What i didn't understand is the format of the points.
The format is described here
As you mentioned that the model expects the cropped face, so first two points are the rectangular coordinates of the file. What format should they be in? Should the first point be x_min,y_min the second point be x_max, ymax? or is it some other format? also what values should the rest of the points be? for now like the 'dummy_pts' file I've set the rest as 500 700
Yes, the first point be x_min, y_min and second point be x_max and y_max. The cropping code inside face_bbx.py takes the tightest box containing all points and therefore, the rest of the points can be anything between x_min and x_max for the x_coordinate and between y_min and y_max for the y coordinate.
As an example, if you choose x_min = 100 x_max = 500, the rest of the points can be 300 Similarly, if you choose y_min = 100, y_max = 700, the rest of the points can be 400
I then produced the json file for these two images using the
get_jsons_from_config.py
file but modifying it slightly so it only produces the val_json files since i'm only infering.
That is perfectly fine.
Then i ran the
validate_and_forward_pass.py
file to get the inference on my data. I usedface-layer-num-8-order-1-model-best.pth.tar
as the saved_wt_file as when i use thelr-0.00002-49.pth.tar
checkpoint file it gives me the error:File "validate_and_forward_pass.py", line 107, in main net.state_dict()[name].copy_(param) RuntimeError: invalid argument 2: sizes do not match at /pytorch/torch/lib/THC/generic/THCTensorCopy.c:101
face-layer-num-8-order-1-model-best.pth.tar
is the DU-Net model trained on 300-W Split 1 and does not give uncertainties or visibilities. So, you should not use this model.
I next list out the command we use for evaluating AFLW_ours model on AFLW_ours val splits.
python evaluate_face_all.py --exp_id run_5004 -s 5 --pp "relu" --bs 12 --laplacian
evaluate_face_all.py
is a wrapper for multiple (all) datasets which calls validate_and_forward_pass.py
. With evaluation over aflw_ours_all_val.json
, it prints out the following command:
python validate_and_forward_pass.py --gpu_id 0 --exp_id run_5004_evaluate/aflw_ours_all \
--val_json dataset/aflw_ours_all_val.json \
--class_num 68 --save_image_heatmaps \
--saved_wt_file abhinav_model_dir/run_5004/lr-0.00002-49.pth.tar \
--layer_num 8 --pp relu --laplacian --use_visibility --hg_wt 0,0,0,0,0,0,0,1 --wt_gau 1.0 --wt_mse 0.0 \
--bs 12 --mlp_tot_layers 1 --mlp_hidden_units 4096
I change the --val_json
argument for this command with our new json. If you could reproduce our numbers for the AFLW_ours dataset, the following command should also run correctly.
python validate_and_forward_pass.py --gpu_id 0 --exp_dir abhinav_model_dir/ --exp_id run_5004_custom_eval \
--val_json dataset/custom_images.json \
--class_num 68 --save_image_heatmaps
--saved_wt_file abhinav_model_dir/run_5004/lr-0.00002-49.pth.tar \
--layer_num 8 --pp "relu" --laplacian --use_visibility --hg_wt 0,0,0,0,0,0,0,1 --wt_gau 1.0 --wt_mse 0.0 \
--bs 12 --mlp_tot_layers 1 --mlp_hidden_units 4096
used the
show_dataset_images_overlaid_with_uncertainties.py
to plot the landmarks.
This is also correct. You should also see plot/make_video_by_images_overlay_with_uncertainties.py
The code runs file and produces the results but the results are very bad. To debug, i played with the .pts file for the images to see what the values should be and copied the same image twice but set the first two points to different values. And the results are indeed different for the same image with different pts files.
The results are output for the cropped images which are different because you cropped it differently.
I'm attaching the images. Please guide me on what else should I look to to correct the results.
I assume that the green points refer to the boundary points (x_min, x_max, y_min, y_max) of your face detector. Based on your green points, I see that your boundary coordinates of the face detector is not correct. I am marking the correct boundary coordinates of your face detector in orange points and its boundary as orange box. Make sure you use the arithmetic mean of these two boundary points as the other landmark points.
Hi @abhi1kumar thankyou for pointing out the mistake. I corrected the landmark points and the model now works fine. But there are a few difficulties that i have.
There are the steps that i took:
1) Change the .pts files for every image so the x_min,y_min,x_max,y_max values are correct. Took the mean of the values and set the rest of the values as the mean.
2) Also at first the pts files only contain 68 points now i put 98 landmark points in each .pts file set the --class_num
flag to 98
when running the validate_and_forward_pass.py
file so now the model lr-0.00002-49.pth.tar
file working and outputting the results.
3) generated the plots so now the plots are good.
What I'm confused about is the results still don't show any occlusion and prints all the landmarks even of smme of them are occluded. I tried debugging the plot file CommonPlottingOperations.py
and noticed that there is a thereshold
value set to 0.65
which tells the function to plot the landmarks for which the vis_estimated
is greater than 0.65
but for each of my image every landmark has vis_estimate
value greater than 0.9
which shouldn't be the case since there is an image with the glasses on and another image with the side on view.
I'm attaching the produced image files. Please let me know what am i missing here. I'm looking forward to you guidance:
Also at first the pts files only contain 68 points now i put 98 landmark points in each .pts file set the
--class_num
flag to98
when running thevalidate_and_forward_pass.py
file so now the modellr-0.00002-49.pth.tar
file working and outputting the results.
You are using the WFLW model (which outputs 98 points) for inference and not MERL-RAV (AFLW_ours/ run_5004) model which outputs 68 points per face. The image also confirms this since the landmark points for the profile faces are on the boundary.
Please make sure you use the correct model. The models trained on datasets other than MERL-RAV will NOT output visibility.
What I'm confused about is the results still don't show any occlusion and prints all the landmarks even of smme of them are occluded. Please let me know what am i missing here. I'm looking forward to you guidance:
The models trained on datasets other than MERL-RAV will NOT output visibility.
Thankyou @abhi1kumar so much. This was indeed the case the model had same name so i was using the one trained on wflw data. I've infered in the merl_rav checkpoint and outputs correctly
Thankyou @abhi1kumar so much. This was indeed the case the model had same name so i was using the one trained on wflw data.
Welcome. The checkpoints name are same since all are finetuned for 50 epochs. However, they are different models.
I've infered in the merl_rav checkpoint and outputs correctly
Perfect.
@waleedrazakhan92 If your issues are fixed, would you mind closing this issue and starring the LUVLi github repo?
sure @abhi1kumar thankyou for guiding me through
Hi @abhi1kumar is there an efficient way to get the full image with the landmarks as an output and not just the cropped image? I've tried recopying the cropped to the original image but i believe the dataloader first resizes them to 256 and then inputs into the model. So the output crops with the landmarks are 256x256 in shape. And all my images with landmarks are being save with 746x746 resolution. So cropping back by using the initial 2 coordinates of the .pts files isn't possible. Is there a workaround as i need the whole image with the landmarks on it?
I've tried setting the scale_mul_factor
value in the face_bbx.py
to 1
but now the images are being save with 709x709 resolution and seem to have white boundaries added to them.
Image with scale 1.1 after plot saving : resolution 746x746
Image with scale 1.0 after plot saving : resolution 709x746
Also the visibility values seem to be very high for most of the landmarks even if they're occluded, like with glasses. I've set the landmark plotting threshold to 0.9 but still almost all of the landmarks get through. You can see the above picture also has an external object occluding the face and in the below face there is glasses. Shouldn't the visibility values be lower than 0.99:
Please let me know what can i do to resolve these problems. I'm looking forward to your reply
Hi @abhi1kumar is there an efficient way to get the full image with the landmarks as an output and not just the cropped image? I've tried recopying the cropped to the original image but i believe the dataloader first resizes them to 256 and then inputs into the model. So the output crops with the landmarks are 256x256 in shape.
I've tried setting the
scale_mul_factor
value in theface_bbx.py
to1
but now the images are being save with 709x709 resolution and seem to have white boundaries added to them.
Unfortunately, I did not try mapping the landmarks on cropped images to the original images. We kept the dataloaders and other processing same as the original DU-Net authors. You might have to look into the pylib folder to see if DU-Net authors provide a function to map the points in cropped image space to the full image space. If there is no such function, then you might have to write one.
Also the visibility values seem to be very high for most of the landmarks even if they're occluded, like with glasses. I've set the landmark plotting threshold to 0.9 but still almost all of the landmarks get through. You can see the above picture also has an external object occluding the face and in the below face there is glasses. Shouldn't the visibility values be lower than 0.99
I think you mis-understood visibility. As mentioned in Section 4 of our paper, there are 2 categories of visibility - invisible and visible, while in general landmarks belong to either of the 3 categories - unoccluded, external occluded (by glasses, hair) and self-occluded (because of the change in profile). For modelling with mixed random variables, invisible (v_j=0)
is equivalent to the self-occluded landmarks, while visible (v_j=1)
refers to both unoccluded and externally occluded landmarks.
The example that you show are the frontal faces and therefore, both unoccluded and external occluded landmarks (such as glasses) have visibility (v_j=1)
. Hence, the visibility for these landmarks should be a value close to 1 (which is what you are seeing). However, you should easily notice that uncertainty ellipses are smaller for unoccluded (such as nose and lips) landmarks, while they are higher for external occluded landmarks (such as eyes behind glasses).
I think you mis-understood visibility. As mentioned in Section 4 of our paper, there are 2 categories of visibility - invisible and visible, while in general landmarks belong to either of the 3 categories - unoccluded, external occluded (by glasses, hair) and self-occluded (because of the change in profile). For modelling with mixed random variables, invisible (v_j=0) is equivalent to the self-occluded landmarks, while visible (v_j=1) refers to both unoccluded and externally occluded landmarks.
Thankyou @abhi1kumar for addressing and clearing both my queries.
The example that you show are the frontal faces and therefore, both unoccluded and external occluded landmarks (such as glasses) have visibility (v_j=1). Hence, the visibility for these landmarks should be a value close to 1 (which is what you are seeing). However, you should easily notice that uncertainty ellipses are smaller for unoccluded (such as nose and lips) landmarks, while they are higher for external occluded landmarks (such as eyes behind glasses).
I did notice the size of the ellipses to be larger for the occluded parts. I'll put a check on the size to filter any unwanted landmarks.
I'll look into the DU-Net repository to see if i can find an inverse method. Thankyou again for clearing everything up.
Hey, @waleedrazakhan92 @abhi1kumar could you provide the code through which you were able to run the model for any arbitrary image?
Hi @Jourtics Thank you for your interest in LUVLi.
could you provide the code through which you were able to run the model for any arbitrary image?
I do not have a script with me. You could follow the steps above to generate predictions for arbitrary images. In case you prepare a script, feel free to raise a PR.
Hi, thanks for the great work. I've downloaded the zip file with the codes and tried running but there seem to be no clear cut indication on how to run so can you please state the requirements for running the inference. I'm choosing to run this code as i need the visibility values for each landmark. Any help regarding running the inference would be highly helpful.