Thanks for the awesome work! I'm curious, how does the trained model have the ability to be conditioned on controllable parameters, broken down as follows?
Shape and appearance latent code: is the Discriminator also be conditioned for this shape and appearance aspects? I cannot find it in the code. If it's indeed not being conditioned, then how does finally the model can associate that the latent code of shape is the variable to control the shape in the data generation? Likewise for the appearance latent code.
When sampling the transformation (s, R, T) and also camera pose per batch, do the corresponding real_data also have the similar properties of them (s, R, T and camera pose)? If not, again, how do the model can associate these controllable variables correctly? For example for an "unwanted case", when we sample T so that the generated object will be in the left, but the corresponding real_data when training the Discriminator has the object in the right.
Hi @ardianumam , thanks for your interest in our project!
No, the discriminator is not conditioned on the codes. We observe that injecting the two latent codes at different parts of the network leads to shape and appearance disentanglement in an unsupervised fashion - there is no explicit supervision on this!
Correct, the real data has to have similar properties wrt. (s, R, T). However, the nice thing in GAN training is that you compare distributions: Hence, you do not have to have paired data of {image} and {corresponding s, R, T}; we only pre-define a prior over {s, R, T} which roughly matches the data distribution, and can then train without any paired information / direct supervision.
Hi,
Thanks for the awesome work! I'm curious, how does the trained model have the ability to be conditioned on controllable parameters, broken down as follows?
Shape and appearance latent code
: is the Discriminator also be conditioned for this shape and appearance aspects? I cannot find it in the code. If it's indeed not being conditioned, then how does finally the model can associate thatthe latent code of shape
is the variable to control the shape in the data generation? Likewise forthe appearance latent code
.real_data
also have the similar properties of them (s, R, T and camera pose)? If not, again, how do the model can associate these controllable variables correctly? For example for an "unwanted case", when we sample T so that the generated object will be in the left, but the correspondingreal_data
when training the Discriminator has the object in the right.Many thanks.