[Questions] - Training Procedure

SamHSlva commented 3 years ago

Hi,

I have some questions about the training procedure:

In the paper, you've mentioned training Stage 1, for 15 epochs, while in the code you've set the instructions to 20 epochs. Is there a reason for such change? Will the results be similar?
I could not get access to Distinctions-646, I had no reply from the authors/maintainers of the dataset. Based on your indicated file structure, I've built a similar dataset, which adds uncertainty to the quality of my training, but it is a risk I am willing to take. To have a comparison parameter (stages 1-3) do not depend on this dataset, would you mind sharing your partial training weights on pytorch (stage1/epoch19.pth, stage2/epoch21.pth, and stage3/epoch22.pth)?
What is the min resolution you've used for the background images while training?

For the 3rd time, thank you very much for your contribution to the field. It was a brilliant work. Looking forward to your future work.

PeterL1n commented 3 years ago

Use whatever is in the paper as the source of truth.
I don't have those checkpoint anymore.
I think all the background images are around 2048x2048.

SamHSlva commented 3 years ago

Hi @PeterL1n , I've trained stages 1, 2, and 3. And I get a reasonably good matting for videos.

Stage 4 would help bring some refinement to the matting, right? What I am experiencing is, as I train stage 4 the model segmentation for videos started degrading, it is happening mostly with longer sequence videos. (With the pre-trained provided weights, I don't have any issues). The reason why I am posting this is on the hopes you could have any insights on why this is happening, and some actions I could take to minimize it.

I have evaluated this video differently, I've split it into frames and evaluated each frame individually. Under such conditions, I do not see the degradation of the segmentation. I've read the training procedures in the paper, github, and code. Both in paper and code, VM HD is not used in stage 4, but in github instructions, it is mentioned that VideoMatte HD is necessary for stage 4. Was it supposed to be used in stage 4 to keep the temporal aspect of the network?

"Matting Datasets VideoMatte240K Download JPEG SD version (6G) for stage 1 and 2. Download JPEG HD version (60G) for stage 3 and 4."

Thank you.

At epoch 25, the first frame looks like this:

0000 0000

While the last frame looks like this:

0342 0342

For comparison, this is the last frame, when the weights of epoch 23 are used:

0342 0342

FengMu1995 commented 3 years ago

@SamHSlva How much are your losses in 4 stages？

SamHSlva commented 3 years ago

Hi,

@FengMu1995 these were the losses I am getting:

Stage 1: Screen Shot 2021-11-03 at 3 33 53 PM Screen Shot 2021-11-03 at 3 38 44 PM

Stage 2: Screen Shot 2021-11-03 at 3 34 07 PM Screen Shot 2021-11-03 at 3 39 04 PM

Stage 3: Screen Shot 2021-11-03 at 3 34 23 PM Screen Shot 2021-11-03 at 3 39 23 PM

Stage 4: Screen Shot 2021-11-03 at 3 34 39 PM Screen Shot 2021-11-03 at 3 39 44 PM

FengMu1995 commented 3 years ago

my loss figure is similar to yours and meet with the same problem with you

PeterL1n commented 3 years ago

刚修复一个train.py的bug。应该和这个有关。 https://github.com/PeterL1n/RobustVideoMatting/issues/100

SamHSlva commented 3 years ago

@PeterL1n thank you for mentioning that. @FengMu1995 I have updated the code according to #100 and I am seeing significant improvement in the levels of detail in the images after epoch 23 (I haven't retrained from the beginning, but I will do starting tonight to see the impact of that)

One question that I have to you @FengMu1995, have you been experiencing output videos with less warm collors? See the attached example of the random video I got from youtube. On the top the output of my model after epoch 23 and on the bottom the output of the provided pre-trained model, both for mobilenetv3:

Screen Shot 2021-11-04 at 11 48 54 AM

FengMu1995 commented 3 years ago

I have retrained my model from epoch 2th. Now 9th-epoch was just finished. I have not met with your condition yet. You can increase epoch by 4 and see the result

PeterL1n commented 3 years ago

一切根据论文

FengMu1995 commented 3 years ago

@SamHSlva 你好，VideoMatte240K_JPEG_HD下载之后发现train/pha不全比train/fgr少很多，你训练的时候发现了吗

PeterL1n commented 3 years ago

@FengMu1995 不可能，你是不是解压出问题了

FengMu1995 commented 3 years ago

thank you, I download once again, it's no problem

SamHSlva commented 3 years ago

@FengMu1995 I've re-trained from scratch, making the changes in the optimizer and adding extra argument for the VideoMatte240K_JPEG_HD at stage3. I've got results very similar to the ones I get with the pre-trained model. @PeterL1n thank you for the replies and patience.

PeterL1n commented 3 years ago

@SamHSlva Congrats

FengMu1995 commented 3 years ago

I wonder why seq-length-lr and seq-length-hr weren't set to 1 in stage-4th? imagematte was trained rather than videomatte.

PeterL1n commented 3 years ago

@FengMu1995 ImageMatte is applied with motion augmentation. It has synthetic motions

FengMu1995 commented 3 years ago

OK, I see that imagematte has been transformed into imageSequence@PeterL1n thanks

hust-kevin commented 3 years ago

@PeterL1n random crop的代码

def random_crop(self, *imgs):
        h, w = imgs[0].shape[-2:]
        w = random.choice(range(w // 2, w))
        h = random.choice(range(w // 2, h))
        results = []
        for img in imgs:
            B, T = img.shape[:2]
            img = img.flatten(0, 1)
            img = F.interpolate(img, (max(h, w), max(h, w)), mode='bilinear', align_corners=False)
            img = center_crop(img, (h, w))
            img = img.reshape(B, T, *img.shape[1:])
            results.append(img)
        return results

为什么 h = random.choice(range(w // 2, h))，而不是 h = random.choice(range(h // 2, h))呢

NewtonLiuD commented 3 years ago

@SamHSlva Have you obtained the AIM data set?

PeterL1n commented 3 years ago

@hust-kevin 是个bug，修复了。但在训练中h和w用的是一样的（512x512这样）所以没区别。

SamHSlva commented 3 years ago

@NewtonLiuD I have not. I've evaluated the dataset structure, went online grabbed a series of HD people images, generated the matting mask with Photoshop, and trained it with Distinctions-646, and the maks I've generated.

NewtonLiuD commented 3 years ago

@SamHSlva Can you share the dataset you created? thank you.

hust-kevin commented 3 years ago

@hust-kevin 是个bug，修复了。但在训练中h和w用的是一样的（512x512这样）所以没区别。
h, w = imgs[0].shape[-2:]
w = random.choice(range(w // 2, w))
h = random.choice(range(w // 2, h))
@PeterL1n w在第二行已经改变了，取值在（256，512）之间，相当于h取值在(128, 512)之间

PeterL1n commented 3 years ago

@hust-kevin 你说的对，也就是说之前训练的时候 random crop 会更偏向横向 aspect ratio 。

hust-kevin commented 3 years ago

@PeterL1n 另外，我更新了project的学习率重新训练后，为什么整个stage1的结果都特别奇怪呢，这是epoch20的结果

PeterL1n commented 3 years ago

@hust-kevin 这个图是test数据吗，train数据结果正常吗？

hust-kevin commented 3 years ago

@PeterL1n 是的，这是测试数据，训练数据是videomatte，拟合是正常的

PeterL1n commented 3 years ago

@hust-kevin 你的 batch size 和 seq length 参数和 paper 一样吗？我在想是否 overfit。还有就是你用了 COCO，YouTubeVIS，和 Supervisely 的 segmentation dataset 了吗？有没有可能那里有问题造成结果不对。

hust-kevin commented 3 years ago

@PeterL1n 参数都是一样的，分割数据集我替换成了其它的包含人的数据集，另外就是训练完stage2都还是之前那个情况，但是只要开始到stage3训练HD的图片，测试结果就会正常，不清楚是什么原因

FengMu1995 commented 3 years ago

@hust-kevin I set fewer seq-length，because my gpu is not enough. I met with the same problem.

pxEkin commented 3 years ago

@hust-kevin 你的 batch size 和 seq length 参数和 paper 一样吗？我在想是否 overfit。还有就是你用了 COCO，YouTubeVIS，和 Supervisely 的 segmentation dataset 了吗？有没有可能那里有问题造成结果不对。

paper里面stage1/2/3/4的epoch分别是15/2/1/5，如果将epoch提升一倍到30/4/2/10, 效果会不会有提升啊？因为训练时间长，我还没测试，大佬之前有尝试过吗

FengMu1995 commented 3 years ago

@ToBigboss,我试了，没有提升

pxEkin commented 3 years ago

@ToBigboss,我试了，没有提升

好的感谢，加q学习交流下啊 707654930

zhanghongyong123456 commented 2 years ago

@hust-kevin 这个图是test数据吗，train数据结果正常吗？

您好，我想问一下，我是用这样的数据作为训练数据的抠像数据集，这对最终结果有影响吗， 0000_mask

0000

surifans commented 2 years ago

我lr设成256x256，hr设成1024，数据集用到了VideoMatte240K_JPEG，coco，SuperviselyPersonDataset，YouTubeVIS，成功复现了和官方resnet50版本差不多的效果，loss图供参考下： 6d6fabdf17e1456d8a3b60f6daffcc64

zhanghongyong123456 commented 2 years ago

我lr256x26，hr设成024，数据图设置成1VideoMatte24K_JPEG，coco SuperviselyPersonDataset，YouTubeVIS成功现了和官方resnet50版本大概的效果，供参考下：

我训练的结果是这样的，这是不是代表不收敛呀

surifans commented 2 years ago

我lr256x26，hr设成024，数据图设置成1VideoMatte24K_JPEG，coco SuperviselyPersonDataset，YouTubeVIS成功现了和官方resnet50版本大概的效果，供参考下：

我训练的结果是这样的，这是不是代表不收敛呀

是的，应该就是不收敛，我训练的时候stage1是一开始收敛很快，后面才慢慢往下降的~

Jon-drugstore commented 2 years ago

@hust-kevin I set fewer seq-length，because my gpu is not enough. I met with the same problem.

File "train.py", line 525, in mp.spawn( File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 230, in spawn return start_processes(fn, args, nprocs, join, daemon, start_method='spawn') File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 188, in start_processes while not context.join(): File "/usr/local/Python-3.8.6/lib/python3.8/site-packages/torch/multiprocessing/spawn.py", line 130, in join raise ProcessExitedException( torch.multiprocessing.spawn.ProcessExitedException: process 5 terminated with signal SIGKILL

How to solve this issue? Thanks

I have solved this issue with torch.distributed.launch method to start multi-process.

Jon-drugstore commented 2 years ago

@PeterL1n 另外，我更新了project的学习率重新训练后，为什么整个stage1的结果都特别奇怪呢，这是epoch20的结果

I have reproduced the author's videos, the results is great! @PeterL1n Thanks for you great work

emlcpfx commented 1 year ago

@SamHSlva I’d love to talk to you about a gig training this repo for our company. Please let me know if you’re available to talk.

PeterL1n / RobustVideoMatting

[Questions] - Training Procedure #82