Inter-person discriminator

Hi, Kudos for the great work. I love the idea of the inter-person discriminator you describe in the paper. However, the following is not clear for me:

How does the discriminator capture interaction between people? How does the architecture of D2 looks like?
Does D2 only accept 2 people, could it accept more people if there is a 3 person interaction?
If there are 3 people in the scene do you use all possible permutations taking 2 people at the time?
Does the order of the inputs Pa and Pb matter?
In your adversarial loss you use the estimated joints Pa and Pb, and its corresponding GT. If you have the GT correspondences then why use a discriminator and not direct supervision?

I couldn't find any of this in the paper nor in the supplementary material. Could you elaborate on this, please.

Sorry for so many questions. Thank you in advanced.

3dpose / 3D-Multi-Person-Pose

Inter-person discriminator #12