Yzmblog / MonoHuman

MonoHuman: Animatable Human Neural Field from Monocular Video (CVPR 2023)
131 stars 9 forks source link

Question about evaluation #15

Closed chunjins closed 8 months ago

chunjins commented 11 months ago

Hi @Yzmblog,

Thanks for the excellent work. It would be one of the most important baselines for our project.

I have some questions about the data you use to evaluate the novel view and novel pose synthesis. For the novel pose evaluation, from section C of the supplemental materials, you mentioned that you sample frames from all cameras at the rate of 30 in Set B and the number of picked frames is 184. I am a bit confused about this part.

Taking 393 which is the longest sequence as an example, N_setB = 658 0.2 = 131.6, N_sample = N_setB 23 / 30 = 100.9 Then how could you get 184 frames for evaluation?

Kindly do let me know how could I get the same frames you use to calculate the scores of Table A2 and A3.

Thanks again!

Yzmblog commented 8 months ago

Hi, I think you are right and I have miscalculated the frame number. To get the same frames, here is the process I used: 1. Divide the camera frames as: nv_eval_frames = all_frames[:-len(all_frames)//5]; np_eval_frames = all_frames[-len(all_frames)//5:];

  1. Then sample at a rate of 30: nv_eval_frames = nv_eval_frames[::30]; np_eval_frames = np_eval_frames[::30].

The above is the code I used to divide them, So I think your calculation is correct. Thanks for pointing it out, I will update it.