Closed pilgrim00 closed 3 years ago
Hi, for the first question, I guess it may be due to using the logits as the 'parsing result' for the image generator for human pose transfer, which may lead to a lower loss with limited regions (it seems due to the upper clothes and lower clothes are the larger region of the human parsing). For the second question, we use coordconv in spatial-aware normalization, which is useful, like position encoding, to learn spatial correspondence in our opinion. But we did not conduct an ablation study about this part.
I am curiosity about the quality of the output parsing map of p2.And I Try to train for several days on V100. The final png is so good.But the par_sav is different from the SPL2.
And I have seen the constrains over them,which should helps to make them very similiar.But how this happends with no head,no skin in the img. And I want to ask for whether the coordconv is useful.