Closed fa9r closed 4 years ago
I think the author use the base_model.build_pyr
function to generate video of different resolution. And using the hparams --loadSize
and --n_scales_spatial
to control it. In the Generator, we can see the for loop: for s in range(n_scales):
means "spatio-temporally progressive manner."
I think these parameters just build the coarse-to-fine generator of pix2pixhd. Is that really what's meant by spatio-temporal progressive? Then what is the temporal aspect? The temporal scale of D?
The temporal aspect can be explained by the iteration of fake_B_prevs
Hmmm, okay I guess. Seems like I understood it wrongly then. Thanks for the help.
Update: In README it says for Cityscapes training:
- We adopt a coarse-to-fine approach, sequentially increasing the resolution from 512 x 256, 1024 x 512, to 2048 x 1024.
- Train a model at 512 x 256 resolution (bash ./scripts/street/train_512.sh)
- Train a model at 1024 x 512 resolution (must train 512 x 256 first) (bash ./scripts/street/train_1024.sh)
This looks a lot more like how I understood "spatio-temporally progressive". I.e. you train 512x256 -> 1024x512 -> 2048x1024.
EDIT: I finally found the temporal growing, it is set by --niter_step, which doubles the sequence length every few epochs.
TL;DR: spatial growing is implemented by manually increasing --load_size and initializing from previous (smaller) checkpoint. Temporal growing is implemented by doubling sequence length every --niter_step iterations.
Hey, in your paper you mention in the experiment section: "Implementation details. We train our network in a spatio-temporally progressive manner. In particular, we start with generating low-resolution videos with few frames, and all the way up to generating full resolution videos with 30 (or more) frames."
How exactly did you do the scaling? I looked through your code, but couldn't find anything related to it. In particular, I would like to know whether you increase both spatial/temporal size at the same time, or one after another, and whether you adjusted other hparams when using it.
What I mean by adjusting hparams is that this progressive growing is mainly used to reduce train time I guess, so if the model would usually take N epochs to train on high res vids from scratch, you probably trained it on M<N epochs per progressive growing stage right? Then what is M? N/#stages? And did you use larger batch size for smaller resolutions, or make any other notable hparam changes?