Wholebody inference - Githubissues

jun297 commented 1 year ago

Hi, thank you for sharing a nice codebase!

I am looking for a way to get COCO wholebody keypoints
How are feet keypoints added additionally? can I get some tips?

JunkyByte commented 1 year ago

Hi @jun297, thanks for your interest. I have two approaches to obtain what you want:

Manually finetuning:

If your idea is to finetune the model by yourself you can try a similar approach to what I did with the feet keypoints, it should be easier with Wholebody as the json files already contain the body keypoints + face / feet / hands. To train on wholebody you will have to change the COCO.py dataset loader to load the annotations correctly. This script is the one I modified to add the feet, you should be able to follow the same approach to add the other parts. Most of the script should work out of the box, be sure to set visibility for joints correctly in the samples that do not have a part of the annotated keypoints. Apart from the data loader you have to modify the configs to have the correct number of keypoints and to make data augmentations work correctly you have to change the flip_pairs and lower / upper body parts in https://github.com/JunkyByte/easy_ViTPose/blob/main/easy_ViTPose/datasets/COCO.py#L113

Use pretrained models from original implementation

While I finetuned just to add feet, the original ViTPose implementation provides a lot of pretrained models. https://github.com/ViTAE-Transformer/ViTPose#wholebody-dataset Unfortunately the wholebody one is called VitPose+, it uses a slightly different architecture and is trained on multiple datasets requiring some engineering to make it usable with the current codebase. I spent a couple hours to make it work and setup a branch for this task. The visualization is still broken and some details are not clear to me but I'm able to run inference correctly. If you have the time to fix the details I left broken I would be interested into making whole body setup the default or provide both models for inference in the main branch.

I will give you details of what I did, how to make everything work and what is left to be done:

start by cloning the repo and checkout the develop branch. You will find a few differences:

new configs for small and base wholebody config/
the inference.py script allows to load these models by using --model-name whole-b (or whole-s)
new backbone architecture vit_models/backbone/
wholebody visualization declaration in vit_utils/visualization.py

Now to load the pretrained models you have to download them from https://github.com/ViTAE-Transformer/ViTPose#wholebody-dataset and convert them using transform_ckpt.py, this script removes the multiple heads from the checkpoint and only leaves the wholebody one which is renamed as the default head, it will create a new checkpoint which can be correctly loaded from this codebase. Once you have the updated checkpoint try to use it with the inference.py, you will see it works correctly but visualization is broken.

TODO: I have hacked the model a bit to make it load without the other heads, as the implementation is slightly different there's space for mistakes. If you check this line of vit_moe you will see I manually override the indices passed to the function. I think that the 'experts' are 6 MLPs used during training to learn dataset specific features, they are 1 for each dataset the model was trained on. As we are only interested in the wholebody I hard coded it to 5 (which is the index of the wholebody dataset during training), you should understand how this work and fix the code accordingly. As we are interested in inference it can be hardcoded. The experts could also be removed (all but the one related to wholebody) but this is less important.

You can fix visualization adding the skeleton and keypoints to visualization.py, you can find the reference used by the original implementation here https://github.com/ViTAE-Transformer/ViTPose/blob/d5216452796c90c6bc29f5c5ec0bdba94366768a/configs/_base_/datasets/coco_wholebody_info.py

You can also add the large and huge models by creating new configs (I would copy paste current one and change network sizes) you can find the configs to get model sizes for large and huge here https://github.com/ViTAE-Transformer/ViTPose/blob/d5216452796c90c6bc29f5c5ec0bdba94366768a/configs/body/2d_kpt_sview_rgb_img/topdown_heatmap/coco/vitPose%2B_base_coco%2Baic%2Bmpii%2Bap10k%2Bapt36k%2Bwholebody_256x192_udp.py#L87

Let me know if you need any further help and directions, feel free to open a PR on develop if you obtain any progress, I won't be online much in the next few days but should be able to reply once per day.

We can further test with onnx / tensorrt exporting once the pytorch version works correctly.

jun297 commented 1 year ago

Thanks a lot for detailed and kind reply If I fix something, I'll let you know

I have one question yet. what is difference between https://github.com/jaehyunnn/ViTPose_pytorch and this repo?

JunkyByte commented 1 year ago

Mostly a general cleanup + onnx and tensorrt seamless inference + fine tuned on 25 keypoints skeleton

mihelich commented 1 year ago

I had a similar need to load the pre-trained animal pose estimation models from the original implementation. I found a simpler approach to transform the ViTPose+ checkpoints:

Patch model_split.py to save new_ckpt['state_dict'] (instead of new_ckpt) and discard unused expert keys correctly (as here).
Download the desired ViTPose+ checkpoint, and use the script to split it into a regular ViTPose model checkpoint for each expert. In my case, I need ap10k.pth or apt36k.pth.
Copy in the relevant model configs from the original implementation.

After patching inference.py to recognize the new configs and visualization.py to use the AP-10K animal skeleton, I'm able to run inference successfully with the animal models. For inference purposes, we don't need the MoE architecture.

I may be able to prepare a PR, depending on how successful my follow-up work is.

JunkyByte commented 1 year ago

@mihelich thanks! That's nice. It might be interesting to work on a PR!

I will take some time to check the code you posted. I think adding support for all the original vitpose models and providing seamless inference using torch / onnx / trt would be a good goal for this repo. What do you think? Did you apply other changes to the original code to achieve your use case?

JunkyByte commented 1 year ago

Hello @mihelich! I checked the code and it seems quite easy to achieve what I had in mind. How did you obtain the correct skeleton for visualization? Did you take the references for ap10k and apt36k from here and convert it manually to the format used in this repo?

If you achieve your purpose and are willing to draft a PR with easy model conversion from VitPose+ and a flag to pick the dataset used during inference I will join and convert / add all the VitPose+ models to this repo. Once inference works for different datasets using the original checkpoints I could add an easy way to create onnx / trt models from torch checkpoints so that everybody can generate their own checkpoints and run inference on different datasets using their preferred backend.

mihelich commented 1 year ago

Hi @JunkyByte,

I think adding support for all the original vitpose models and providing seamless inference using torch / onnx / trt would be a good goal for this repo.

That sounds great!

How did you obtain the correct skeleton for visualization? Did you take the references for ap10k and apt36k from here and convert it manually to the format used in this repo?

Yes, exactly that.

Did you apply other changes to the original code to achieve your use case?

The only other change was switching the YOLOv5 detection to look for dogs instead of people. So that'll need to be configurable.

If you achieve your purpose and are willing to draft a PR with easy model conversion from VitPose+ and a flag to pick the dataset used during inference I will join and convert / add all the VitPose+ models to this repo.

I did get quite promising results for animal pose estimation. I have some other priorities at the moment, but I should be able to tidy up a draft PR this week.

JunkyByte commented 1 year ago

Hey @mihelich any news on that PR draft? :)

mihelich commented 1 year ago

@JunkyByte I haven't forgotten about it, just been slammed on the personal front. Will push something when I have the chance.

JunkyByte commented 1 year ago

Will close for inactivity, feel free to reopen!

JunkyByte / easy_ViTPose

Wholebody inference #8

Manually finetuning:

Use pretrained models from original implementation