Wuziyi616 / SlotDiffusion

Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models
https://slotdiffusion.github.io/
MIT License
78 stars 7 forks source link

Pretrained diffusion models #6

Closed kaanakan closed 3 weeks ago

kaanakan commented 3 weeks ago

Hello,

Thank you for sharing your work!

I wanted to inquire whether you've explored integrating pretrained diffusion models like Stable Diffusion v1.5 or v2.1 into your project. If so, I’d love to hear more about the results and any insights you can share.

Thanks in advance for your time and assistance!

Wuziyi616 commented 3 weeks ago

Yes I've tried that, basically using SD as the slot decoder. But I didn't go very far as fine-tuning requires huge memory. If I freeze it, I cannot learn good object-centric representation.

But you can check out the Stable-LSD variant in this work, they have shown some promising results using SD.

kaanakan commented 3 weeks ago

Thank you for your response! If possible, could you kindly share the results you have for the COCO and VOC datasets? I’d greatly appreciate it.

Best regards,

Wuziyi616 commented 3 weeks ago

I think I don't have them anymore. Well, even in Stable-LSD, the reconstruction results are not very good TBH. Also IIRC, it's not capable of compositional generation at all. So I don't know if this can even work -- the part-whole ambiguity is too complicated in real-world data, unsupervised decomposition is just too hard.

Wuziyi616 commented 3 weeks ago

There is another paper you might be interested in: https://arxiv.org/abs/2407.17929 They use more pre-trained knowledge to generate pseudo masks to supervise their slot + SD model. They can get good segmentation results, but still the generation is quite bad I think