autonomousvision / giraffe

This repository contains the code for the CVPR 2021 paper "GIRAFFE: Representing Scenes as Compositional Generative Neural Feature Fields"
https://m-niemeyer.github.io/project-pages/giraffe/index.html
MIT License
1.23k stars 160 forks source link

About object separation #10

Closed hengfei-wang closed 3 years ago

hengfei-wang commented 3 years ago

Hi,

Thank you for releasing the code. Your work is impressive.

I have some questions about object separation. It seems that your work can separate the objects in the input images. And we can set N to change the number of objects we want the network to recognize. So my question is, what would happen if we set a big N which is much bigger than the number of objects in one scene? In general, we see one person or one face as a whole object. But can the network learn different parts of one general object (like eyes, mouth, nose from one face)? If the network can do that, how? Maybe need more constraints?

Best regards, Hengfei

m-niemeyer commented 3 years ago

Hi @hengfei-wang , thanks for your intriguing question. We have never tried this, but theoretically the model could learn to disentangle parts of objects instead of objects themselves. However, the performance presumably heavily depends on your dataset: For a face dataset, it could see "similar eyes" with "different lip types", while e.g. disentangling chair legs from the chair body is probably much harder as the two entities are very entangled. In general, I think the network "can" learn this if this is the "easier solution" - and whether or not this is the case depends on how correlated the entities are.