Closed zadaianchuk closed 4 years ago
Hi, sure. It's also at the appendix of the paper, but I will highlight some here. I recommend using a size of 4 x 4 as the image encoding map if the number of objects is small, this will highly reduce the memory consumption and training time. Similarly, the z^what dimension can be small, e.g. 4 or 8, if your sprites are not too complex. We use a learning rate of 5e-4, and a standard deviation of 0.2 for dSprites experiments. Also, we constrain z^scale on synthetic datasets so that it can vary from half to 1.5 times the actual object size. The prior for z^pres in discovery is set to be 0.1 at the beginning of training and to quickly anneal to 1e-4.
Thanks a lot for your answer!
I also was interested in the possibility of adding a bias to architecture towards propagation, e.g. if you have the same number of objects in video, is it possible to only discover in first time frame and propagate in all other?
I guess it is the explained_ratio_threshold
parameter that is responsible for this.
Hi, sure. In that case, you can actually skip the discovery step for all timestep t > 0 and manually set the propagated z^pres to be 1. To skip the discovery, simply set the discovery variable to be zero or empty tensors.
That helped! One more thanks for the support.
Hi, thanks a lot for your code! I'm trying to reproduce results with dSprites. Can you upload in some form the parameters used for synthetic dataset training? Thanks a lot!