Closed SamHSlva closed 2 years ago
Hi @PeterL1n , I've trained stages 1, 2, and 3. And I get a reasonably good matting for videos.
Stage 4 would help bring some refinement to the matting, right? What I am experiencing is, as I train stage 4 the model segmentation for videos started degrading, it is happening mostly with longer sequence videos. (With the pre-trained provided weights, I don't have any issues). The reason why I am posting this is on the hopes you could have any insights on why this is happening, and some actions I could take to minimize it.
I have evaluated this video differently, I've split it into frames and evaluated each frame individually. Under such conditions, I do not see the degradation of the segmentation. I've read the training procedures in the paper, github, and code. Both in paper and code, VM HD is not used in stage 4, but in github instructions, it is mentioned that VideoMatte HD is necessary for stage 4. Was it supposed to be used in stage 4 to keep the temporal aspect of the network?
"Matting Datasets VideoMatte240K Download JPEG SD version (6G) for stage 1 and 2. Download JPEG HD version (60G) for stage 3 and 4."
Thank you.
At epoch 25, the first frame looks like this:
While the last frame looks like this:
For comparison, this is the last frame, when the weights of epoch 23 are used:
@SamHSlva How much are your losses in 4 stages?
Hi,
@FengMu1995 these were the losses I am getting:
Stage 1:
Stage 2:
Stage 3:
Stage 4:
my loss figure is similar to yours and meet with the same problem with you
刚修复一个train.py的bug。应该和这个有关。 https://github.com/PeterL1n/RobustVideoMatting/issues/100
@PeterL1n thank you for mentioning that. @FengMu1995 I have updated the code according to #100 and I am seeing significant improvement in the levels of detail in the images after epoch 23 (I haven't retrained from the beginning, but I will do starting tonight to see the impact of that)
One question that I have to you @FengMu1995, have you been experiencing output videos with less warm collors? See the attached example of the random video I got from youtube. On the top the output of my model after epoch 23 and on the bottom the output of the provided pre-trained model, both for mobilenetv3:
I have retrained my model from epoch 2th. Now 9th-epoch was just finished. I have not met with your condition yet. You can increase epoch by 4 and see the result
一切根据论文
@SamHSlva 你好,VideoMatte240K_JPEG_HD下载之后发现train/pha不全比train/fgr少很多,你训练的时候发现了吗
@FengMu1995 不可能,你是不是解压出问题了
thank you, I download once again, it's no problem
@FengMu1995 I've re-trained from scratch, making the changes in the optimizer and adding extra argument for the VideoMatte240K_JPEG_HD at stage3. I've got results very similar to the ones I get with the pre-trained model. @PeterL1n thank you for the replies and patience.
@SamHSlva Congrats
I wonder why seq-length-lr and seq-length-hr weren't set to 1 in stage-4th? imagematte was trained rather than videomatte.
@FengMu1995 ImageMatte is applied with motion augmentation. It has synthetic motions
OK, I see that imagematte has been transformed into imageSequence@PeterL1n thanks
@PeterL1n random crop的代码
def random_crop(self, *imgs):
h, w = imgs[0].shape[-2:]
w = random.choice(range(w // 2, w))
h = random.choice(range(w // 2, h))
results = []
for img in imgs:
B, T = img.shape[:2]
img = img.flatten(0, 1)
img = F.interpolate(img, (max(h, w), max(h, w)), mode='bilinear', align_corners=False)
img = center_crop(img, (h, w))
img = img.reshape(B, T, *img.shape[1:])
results.append(img)
return results
为什么 h = random.choice(range(w // 2, h)),而不是 h = random.choice(range(h // 2, h))呢
@SamHSlva Have you obtained the AIM data set?
@hust-kevin 是个bug,修复了。但在训练中h和w用的是一样的(512x512这样)所以没区别。
@NewtonLiuD I have not. I've evaluated the dataset structure, went online grabbed a series of HD people images, generated the matting mask with Photoshop, and trained it with Distinctions-646, and the maks I've generated.
@SamHSlva Can you share the dataset you created? thank you.
@hust-kevin 是个bug,修复了。但在训练中h和w用的是一样的(512x512这样)所以没区别。
h, w = imgs[0].shape[-2:] w = random.choice(range(w // 2, w)) h = random.choice(range(w // 2, h))
@PeterL1n w在第二行已经改变了,取值在(256,512)之间,相当于h取值在(128, 512)之间
@hust-kevin 你说的对,也就是说之前训练的时候 random crop 会更偏向横向 aspect ratio 。
@PeterL1n 另外,我更新了project的学习率重新训练后,为什么整个stage1的结果都特别奇怪呢,这是epoch20的结果
@hust-kevin 这个图是test数据吗,train数据结果正常吗?
@PeterL1n 是的,这是测试数据,训练数据是videomatte,拟合是正常的
@hust-kevin 你的 batch size 和 seq length 参数和 paper 一样吗?我在想是否 overfit。还有就是你用了 COCO,YouTubeVIS,和 Supervisely 的 segmentation dataset 了吗?有没有可能那里有问题造成结果不对。
@PeterL1n 参数都是一样的,分割数据集我替换成了其它的包含人的数据集,另外就是训练完stage2都还是之前那个情况,但是只要开始到stage3训练HD的图片,测试结果就会正常,不清楚是什么原因
@hust-kevin I set fewer seq-length,because my gpu is not enough. I met with the same problem.
@hust-kevin 你的 batch size 和 seq length 参数和 paper 一样吗?我在想是否 overfit。还有就是你用了 COCO,YouTubeVIS,和 Supervisely 的 segmentation dataset 了吗?有没有可能那里有问题造成结果不对。
paper里面stage1/2/3/4的epoch分别是15/2/1/5, 如果将epoch提升一倍到30/4/2/10, 效果会不会有提升啊? 因为训练时间长,我还没测试,大佬之前有尝试过吗
@ToBigboss,我试了,没有提升
@ToBigboss,我试了,没有提升
好的 感谢,加q学习交流下啊 707654930
@hust-kevin 这个图是test数据吗,train数据结果正常吗?
您好,我想问一下,我是用这样的数据作为训练数据的抠像数据集,这对最终结果有影响吗,
我lr设成256x256,hr设成1024,数据集用到了VideoMatte240K_JPEG,coco,SuperviselyPersonDataset,YouTubeVIS,成功复现了和官方resnet50版本差不多的效果,loss图供参考下:
我lr256x26,hr设成024,数据图设置成1VideoMatte24K_JPEG,coco SuperviselyPersonDataset,YouTubeVIS成功现了和官方resnet50版本大概的效果,供参考下:
我训练的结果是这样的,这是不是代表不收敛呀
我lr256x26,hr设成024,数据图设置成1VideoMatte24K_JPEG,coco SuperviselyPersonDataset,YouTubeVIS成功现了和官方resnet50版本大概的效果,供参考下:
我训练的结果是这样的,这是不是代表不收敛呀
是的,应该就是不收敛,我训练的时候stage1是一开始收敛很快,后面才慢慢往下降的~
@hust-kevin I set fewer seq-length,because my gpu is not enough. I met with the same problem.
File "train.py", line 525, in
How to solve this issue? Thanks
I have solved this issue with torch.distributed.launch method to start multi-process.
@PeterL1n 另外,我更新了project的学习率重新训练后,为什么整个stage1的结果都特别奇怪呢,这是epoch20的结果
I have reproduced the author's videos, the results is great! @PeterL1n Thanks for you great work
@SamHSlva I’d love to talk to you about a gig training this repo for our company. Please let me know if you’re available to talk.
Hi,
I have some questions about the training procedure:
For the 3rd time, thank you very much for your contribution to the field. It was a brilliant work. Looking forward to your future work.