Closed fangchao closed 4 years ago
It's done on the fly as part of data augmentation(with a chance). As far as stability of video is considered, this one seems to be better than 3 channel inputs. The training dataset consists of close--up portraits mostly. If we train on a bigger dataset(with variety) it may work better, i suppose. Please look at the original paper for more information. Also make sure you use proper normalization, preprocessing etc.
@anilsathyan7 I've got it. Thanks for your reply.
Congrats on the awesome work done and thanks for sharing. I wanna train a portrait-net for video. The performance is not as good as the usually semantic segmentation net whose number of input channel is 3. Could you tell me how many empty previous mask and augmented previous mask in your training set? Thanks.