akanazawa / hmr

Project page for End-to-end Recovery of Human Shape and Pose
Other
1.55k stars 391 forks source link

How is the paired/unpaired setting defined? #132

Closed russoale closed 4 years ago

russoale commented 4 years ago

Hi @akanazawa,

thanks for the great work. I'm currently working on a TF2.0 implementation of your work in a Keras Model Based version. While evaluating, I am not sure which training setting I really have to use and therefore compare the performance witch the results published in the paper.

In Section 3. Model of the paper the unpaired setting is defined as

Additionally we assume that there is a pool of 3D meshes of human bodies of varying shape and pose. Since these meshes do not necessarily have a corresponding image, we refer to this data as unpaired [55].

But then later in Section 4.3. Without Paired 3D Supervision it is said to be

So far we have used paired 2D-to-3D supervision, i.e. L3D whenever available. Here we evaluate a model trained without any paired 3D supervision. We refer to this setting as HMR unpaired and report numerical results in all the tables.

Could you please clarify this?

Thanks

akanazawa commented 4 years ago

Hi,

Hope this clears it up:

Unpaired:

Paired:

For COCO or any in-the-wild human dataset without ground truth 3D, the only available option is unpaired. If you train on both COCO and Human3.6M, that is technically paired bc 3D is available for Human3.6M.

Best,

Angjoo

russoale commented 4 years ago

Thanks for the quick reply.

So in the unpaired setting the encoders loss will be calculated encoder_loss = kp2d_loss + encoder_disc_loss where encoder_disc_loss is still the combined encoder_theta (all IF loop predictions) + gt_theta (from CMU or jointLim)?

russoale commented 4 years ago

Hi @akanazawa,

I think I might have a pretty close implementation to yours based on TF 2.1 with keras. I just have two question:

Best regards!

akanazawa commented 4 years ago

Great!

  1. Yes I believe I just experimented with the same dataset but used a flag that does not use any 3D ground truth for unpaired.

  2. I forget about this pre-processing... Why can't the Phoning 2 and 3 not used for evaluation? I recall there may have been some known issue with one of the videos being corrupt but I'm not sure if this is related. Anyhow, the processing code of this dataset is available here and this is probably more insightful than my memory :).

Best,

Angjoo

russoale commented 4 years ago
  1. Great, I will train again and then evaluate.

  2. Good question. Phoning 2 and 3 should be included but as far as I can tell (without having looked at read_human36m.py) the trial_ids specifies the index of the given sequences. Unfortunatles the link your have provided results in a 404. I assume its a private repo. Could you maybe create a public gist?

Thanks for the support!

akanazawa commented 4 years ago

Oops I meant to link to this one

Thanks!

A