Closed hengfei-wang closed 3 years ago
Hi @hengfei-wang , thanks for your intriguing question. We have never tried this, but theoretically the model could learn to disentangle parts of objects instead of objects themselves. However, the performance presumably heavily depends on your dataset: For a face dataset, it could see "similar eyes" with "different lip types", while e.g. disentangling chair legs from the chair body is probably much harder as the two entities are very entangled. In general, I think the network "can" learn this if this is the "easier solution" - and whether or not this is the case depends on how correlated the entities are.
Hi,
Thank you for releasing the code. Your work is impressive.
I have some questions about object separation. It seems that your work can separate the objects in the input images. And we can set N to change the number of objects we want the network to recognize. So my question is, what would happen if we set a big N which is much bigger than the number of objects in one scene? In general, we see one person or one face as a whole object. But can the network learn different parts of one general object (like eyes, mouth, nose from one face)? If the network can do that, how? Maybe need more constraints?
Best regards, Hengfei