Hi @garyzhao Thanks so much for your excellent work. I have some questions about the perceptual features.
In my understanding, the final feature size that put into SemGCN is C_1 + C_2 + C_3 + C_4 + 2(like 256+512+1024+2048+2),and then the feature is mapped into hid_dim(like 128).
I wonder if incorporating image features really work? The 2D pose(x,y) only accounts for 2/3842 of input, and the hid_dim is much smaller than the input dim.
Thanks a lot!
Hi @garyzhao Thanks so much for your excellent work. I have some questions about the perceptual features. In my understanding, the final feature size that put into SemGCN is C_1 + C_2 + C_3 + C_4 + 2(like 256+512+1024+2048+2),and then the feature is mapped into hid_dim(like 128). I wonder if incorporating image features really work? The 2D pose(x,y) only accounts for 2/3842 of input, and the hid_dim is much smaller than the input dim. Thanks a lot!