Reimplemented by referring to the official code https://github.com/google-research/slot-attention-video. ICLR 2022 Paper "Conditional Object-Centric Learning from Video".
The train/eval performance is surely even a bit better under 10 random seeds.
By contrast, the official implementation (shit code) is difficult to setup environment and difficult to debug and difficult to modify for academic experiments.
Figure: SAVi-small on MOVi-A.
- configs
└ savi_small-movi_a.py
- output
└ {random seed}.txt # my training log files
- analyze.py # visualize training logs
- datum.py # dataset and transforms
- learn.py # optimizers, lr schedulers, logging, etc.
- main.py # entry point
- model.py # modelling, initializing
- utils.py # config based registry and building APIs
pip install -r requirements.txt
python datum.py
. But firstly download original dataset from here: https://console.cloud.google.com/storage/browser/kubric-public/tfdspython main.py
or sh run.sh
python analyze.py
I am now working on object-centric learning problems. If you have any challenging problems or ideas about this please do not hesitate to contact me.