Closed alexcbb closed 11 months ago
Hi, I cannot see any figures in the post, maybe you forgot to attach it? Can you tell me what config file you are using?
@Wuziyi616 Yes sorry, here is the figure :
I use the config file "sa_ldm_dino_coco_params-res224.py" from img_based configs folder
Attached is one result I have, which shows some rough appearance of objects, but the details are wrong. It is true that SlotDiffusion still cannot reconstruct natural images very well (We do acknowledge this in the limitations & future works):
With the help of pre-trained encoders such as DINO ViT [ 7 ], SlotDiffusion is able to segment objects from naturalistic images [ 21 , 54]. However, we are still unable to decode natural images faithfully from slots.
However, your reconstructed images seem too bad (no object at all). I don't really understand why. Have you tried other training configs (e.g. CLEVRTex), and are those reconstruction results also this bad?
For now I only have tested on COCO because I have this dataset, but not yet on others. And so you never encountered this kind of problem on image reconstruction where nothing reconstructs correctly ? I will investigate on my side, maybe it's related to he script
Hi yes I never saw this in my experiments. And I know someone is able to get reasonable reconstruction results using this repo (though not on this dataset). So, the first thing to check is, if you just do encoding-decoding (auto-encoding) using the VQ-VAE, can you get good reconstruction results?
Feel free to re-open the issue if you have further questions
Hello,
First, thank you for you amazing work ! I tried to launch a training on a SLURM cluster and get these results (it has been training now for 10hours on 2 A100 GPUs and has still 10 hours to go). The reconstruction does seem weird to me compared to the segmentation, can you tell me if this seems alright ? The training loss is decreasing very slowly, while the var loss seems better.