ShuhongChen / bizarre-pose-estimator

WACV2022: Transfer Learning for Pose Estimation of Illustrated Characters
GNU Affero General Public License v3.0
228 stars 14 forks source link

Question about generating the pose_descriptors for support set #4

Closed mrbulb closed 2 years ago

mrbulb commented 2 years ago

Hi, thanks for your great work! The detection results are really amazing:smile: I have a question about the pose retrieval code. When you retrieval on the support set, the code load a pickle file. May I ask how did you generate this pickle file? Would you mind releasing the code for generating this pickle file?

https://github.com/ShuhongChen/bizarre-pose-estimator/blob/dace2253ee27ffcefbe7fa444dd88cc894cafd8e/_scripts/pose_retrieval.py#L126

Besides, I try to use real human pose to retrieval the support set. However, the detection result only has 17 keypoints that do not match the training data dimension(with 25 keypoint). Would you mind releasing the raw detection result for support set so that I can generate pose_descriptors with 17 keypoint by myself?

ShuhongChen commented 2 years ago

Thanks for the kind words!

We added raw_retrieval_support.zip to the downloads folder; they're pickles of dicts with raw model outputs of the support set. The code to generate nbrs is roughly the following (untested, but you can get the idea):

from _util.util_v1 import * ; import _util.util_v1 as uutil
from _util.pytorch_v1 import * ; import _util.pytorch_v1 as utorch
from _util.twodee_v0 import * ; import _util.twodee_v0 as u2d
import _util.keypoints_v0 as ukey

# read and merge raw_retrieval_support
tdn = 'extract/folder/of/raw_retrieval_support'
modof = 5
proc = {
    k: v
    for fn in os.listdir(tdn)
    if fn.endswith(f'_{modof}.pkl')
    for k,v in pload(f'{tdn}/{fn}').items()
}

# calc distance matrix descriptors
dmat = np.stack([
    scipy.spatial.distance.pdist(
        u2d.cropbox_points(proc[bn]['keypoints'], *proc[bn]['cropbox'])
    )
    for bn in proc.keys()
])

# wrap into sklearn nearestneighbors
nbrs = sklearn.neighbors.NearestNeighbors(n_neighbors=32).fit(dmat)
pdump(nbrs, mkdir(f'{tdn}/nbrs.pkl'))

If you want to do only 17 keypoint descriptors, you can filter out the 8 predicted midpoints using proc[bn]['keypoints'][:17]. I believe the first 17 are COCO, and the last 8 are midpoints, in the following order:

coco_keypoints_mid = np.asarray([
    ( 5,  7), # 17, upper_arm_left
    ( 6,  8), # 18, upper_arm_right
    ( 7,  9), # 19, lower_arm_left
    ( 8, 10), # 20, lower_arm_right
    (11, 13), # 21, upper_leg_left
    (12, 14), # 22, upper_leg_right
    (13, 15), # 23, lower_leg_left
    (14, 16), # 24, lower_leg_right
])
# i.e. index #17 is midpoint of #5 and #7, representing upper-left arm

I should note that we never tried retrieval without those 8 extra points; if your results aren't great, you can try extending your query to the full 25 points by manually calculating the 8 extra midpoints.

Besides, I try to use real human pose to retrieval the support set.

I'm guessing you're building an app where you take a picture of yourself, and then search for similarly-posed anime references? This may or may not work because of proportion differences in anime (we discuss this a bit at the end of sec5.1 of the paper). In particular, you may want to remove or down-weigh eye/nose/ear keypoints, since head shapes are the most different. Or it might just work perfectly out-of-the-box, idk. In any case, let me know what happens; interested in seeing what you make!

mrbulb commented 2 years ago

I'm guessing you're building an app where you take a picture of yourself, and then search for similarly-posed anime references? This may or may not work because of proportion differences in anime (we discuss this a bit at the end of sec5.1 of the paper). In particular, you may want to remove or down-weigh eye/nose/ear keypoints, since head shapes are the most different. Or it might just work perfectly out-of-the-box, idk. In any case, let me know what happens; interested in seeing what you make!

Haha, I did try to search for similarly-posed anime references retrieval by real person image😄 .

Previously, I had mistakenly thought that the 8 extra points were defined by the same rule following the bizarre pose dataset(i.e. body_upper, neck_base, head_base, nose_root, trapezium_left, trapezium_right, tiptoe_left, tiptoe_right). So I could not get satisfactory retrieval results.

After you explained the definition for 8 extra points, I could easily calculate these 8 extra midpoints. I choose some JoJo Poses as inputs, the retrieval results are really impressive.

Thank you for you great work again, I really like it 😃

2022-01-14-关键点检测-DEKR+retrieval-展示结果-1 2022-01-14-关键点检测-DEKR+retrieval-展示结果-2 2022-01-14-关键点检测-DEKR+retrieval-展示结果-3 2022-01-14-关键点检测-DEKR+retrieval-展示结果-4

ShuhongChen commented 2 years ago

Cool results! Looks like 4th row doesn't do so well, but 3rd row looks pretty good. Monroe's pose is pretty hard especially because of her dress, I'm surprised your human estimator got it right; I guess average anime illustrators might not be familiar enough with her photo to reference it, or our system doesn't do well on that pose.

I think it's actually really hard to retrieve real Jojo poses. One reason is the support distribution; I'd say a good chunk of the art is from fans of franchises like Touhou, not stand users. There's still less than 100 full_body+jojo_pose images there now. Second reason is the keypoints aren't descriptive enough; the fashion poses Araki chooses depend a lot on body part orientation and hands/feet. It'd be great if we could make a densepose estimator or something, but that's super hard. Lots of work to be done!