To run the code on your own video, it is needed to generate p2d, affpts, and affb (as defined here), which correspond to joints' location, joints' confidence, and bones' confidence.
For p2d and affpts, any off-the-shelf 2D pose estimators can be used to extract joints' location and their confidence values.
For affb, Part Affinity Field model can be used to extract the bone confidence, example code is here.
Note that we use the keypoint definition of H36M dataset, which is compatible with CrowdPose dataset but different from the COCO keypoint definition.
To run the code on your own video, it is needed to generate p2d, affpts, and affb (as defined here), which correspond to joints' location, joints' confidence, and bones' confidence.