For localized representations including
sketches, segmentation masks, depthmaps, intensity
images, and masked images, we project them into uniform dimensional
embeddings with the same spatial size as the
noisy latent xt using stacked convolutional layers.
@huanglianghua In section 2.3,
What is the value of uniform dimension here?