Wuziyi616 / SlotDiffusion

Code release for NeurIPS 2023 paper SlotDiffusion: Object-centric Learning with Diffusion Models
https://slotdiffusion.github.io/
MIT License
79 stars 8 forks source link

Training time and params #4

Closed alexcbb closed 11 months ago

alexcbb commented 12 months ago

Hello,

First, thank you for you amazing work ! I tried to launch a training on a SLURM cluster and get these results (it has been training now for 10hours on 2 A100 GPUs and has still 10 hours to go). The reconstruction does seem weird to me compared to the segmentation, can you tell me if this seems alright ? The training loss is decreasing very slowly, while the var loss seems better.

Wuziyi616 commented 12 months ago

Hi, I cannot see any figures in the post, maybe you forgot to attach it? Can you tell me what config file you are using?

alexcbb commented 12 months ago

@Wuziyi616 Yes sorry, here is the figure : media_images_val_recon_227199_9cc3f3c640bb1763b180

I use the config file "sa_ldm_dino_coco_params-res224.py" from img_based configs folder

Wuziyi616 commented 12 months ago

Attached is one result I have, which shows some rough appearance of objects, but the details are wrong. It is true that SlotDiffusion still cannot reconstruct natural images very well (We do acknowledge this in the limitations & future works):

With the help of pre-trained encoders such as DINO ViT [ 7 ], SlotDiffusion is able to segment objects from naturalistic images [ 21 , 54]. However, we are still unable to decode natural images faithfully from slots.

However, your reconstructed images seem too bad (no object at all). I don't really understand why. Have you tried other training configs (e.g. CLEVRTex), and are those reconstruction results also this bad?

image

alexcbb commented 12 months ago

For now I only have tested on COCO because I have this dataset, but not yet on others. And so you never encountered this kind of problem on image reconstruction where nothing reconstructs correctly ? I will investigate on my side, maybe it's related to he script

Wuziyi616 commented 12 months ago

Hi yes I never saw this in my experiments. And I know someone is able to get reasonable reconstruction results using this repo (though not on this dataset). So, the first thing to check is, if you just do encoding-decoding (auto-encoding) using the VQ-VAE, can you get good reconstruction results?

Wuziyi616 commented 11 months ago

Feel free to re-open the issue if you have further questions