carrierlxk / COSNet

See More, Know More: Unsupervised Video Object Segmentation with Co-Attention Siamese Networks (CVPR19)
321 stars 61 forks source link

How to train your deeplab pretrained model? #14

Open lscelory opened 4 years ago

lscelory commented 4 years ago

Hi, I'm so excited that you released your training code and it's very helpful for me. Thank you for sharing.

I have a question about your pretrained deeplabv3 model which you called deeplab_davis_12_0.pth. I wonder how you train this model. Are you train the Res_Deeplab in your code using saliency data or davis training set or both of them? And could you give me some essential settings about train this pretrained model such as dataset, learning rate, weight decay and maxepoch. I want to reappear your pretrianed model by myself and use them as reference in training my pretrained model in my task.

Thank you so much again!

Sincerely
shichao

lscelory commented 4 years ago

Following the naming format of deeplab_davis_12_0.pth, I pre-trained deeplabv3 only on davis16 dataset.(with batch_size=8, learning_rate=0.001, maxEpoches=120, weight_decay=0.0005) And I got J=72.05% on davis16 test set. Then I initialized the deeplabv3 part of COSNet by my pretrained model and fune tune the whole COSNet on davis16 and saliency dataset using your training code, I only got J=68.26%. It seems like saliency data decrease the model performance on davis data.

On the other hand, I pre-trained deeplabv3 only on saliency dataset(MSRA10K and DUT), (with batch_size=10, learning_rate=0.001, maxEpoches=20, weight_decay=0.0001). I got best J=74.55% when using model trained on saliency data and tested on davis16 test set. Then I use the model for initialization to fune tune COSNet, I only got J=77.33%.

Loading your deeplab_davis_12_0.pth as pretrained model and fine tune the COSNet, I got J=81.98%. It is much higher than my 77.33% result.

Could you tell me how you get your pretrained model? It is so important for my current work. Thanks again!

Best, shichao

wangbo-zhao commented 4 years ago

Have you solved your problem? I also want to know how to train the pretrained models. Is it pretrained on ImageNet?

lscelory commented 4 years ago

@wangbo-zhao As the paper said, the author used saliency dataset(MSRA10K and DUT) to pretrain the model. I used this two dataset to pretrain the model but I didn't get the comparable result. I initialized the model by deeplabv3 pretrained on MSCOCO. However, I still can't get result high enough by just using the pretrained model to test as the author provided one.

wangbo-zhao commented 4 years ago

Did you use any data augmentation?

lscelory commented 4 years ago

Did you use any data augmentation?

Just same with the official code in the dataloader.py file, which including filp, rescale and crop.

wangbo-zhao commented 4 years ago

And the did you use the Resnet101 pretrained on Imagenet?

lscelory commented 4 years ago

I used the ResNet101 pretrained on MSCOCO, which was provided by DeepLabV2.

wangbo-zhao commented 4 years ago

Can you give me the link of ResNet101 pretrained on MSCOCO? I want to have a try.

lscelory commented 4 years ago

You can download this model here.

EnQing626 commented 4 years ago

Following the naming format of deeplab_davis_12_0.pth, I pre-trained deeplabv3 only on davis16 dataset.(with batch_size=8, learning_rate=0.001, maxEpoches=120, weight_decay=0.0005) And I got J=72.05% on davis16 test set. Then I initialized the deeplabv3 part of COSNet by my pretrained model and fune tune the whole COSNet on davis16 and saliency dataset using your training code, I only got J=68.26%. It seems like saliency data decrease the model performance on davis data.

On the other hand, I pre-trained deeplabv3 only on saliency dataset(MSRA10K and DUT), (with batch_size=10, learning_rate=0.001, maxEpoches=20, weight_decay=0.0001). I got best J=74.55% when using model trained on saliency data and tested on davis16 test set. Then I use the model for initialization to fune tune COSNet, I only got J=77.33%.

Loading your deeplab_davis_12_0.pth as pretrained model and fine tune the COSNet, I got J=81.98%. It is much higher than my 77.33% result.

Could you tell me how you get your pretrained model? It is so important for my current work. Thanks again!

Best, shichao

Hi @lscelory , when you loading the 'deeplab_davis_12_0.pth' as the pre-trained model and fine-tune the COSNet, did you fine-tune this model on both saliency datasets(MSRA10K and DUT) and DAVIS16? Could you give me some settings about how you got J=81.98%? Thank you.

lscelory commented 4 years ago

@CJEQ Yes, I used both saliency data and DAVIS16 data train COSNet in an alternately way. You can find it in the author's training code. And I just used the default settings provided by the author (lr=0.00025, wd=0.0005). Noted that I got my J by my own metrics calculation code and I found its result is higher than using DAVIS16 official benchmark code afterwards. So my 81.98% is an unreliable result. You might use code here to evaluate your model. To be mentioned that I haven't reappeared the comparable J to score reported in his paper until now, hoping you can solve it!

looong96 commented 4 years ago

@CJEQ Yes, I used both saliency data and DAVIS16 data train COSNet in an alternately way. You can find it in the author's training code. And I just used the default settings provided by the author (lr=0.00025, wd=0.0005). Noted that I got my J by my own metrics calculation code and I found its result is higher than using DAVIS16 official benchmark code afterwards. So my 81.98% is an unreliable result. You might use code here to evaluate your model. To be mentioned that I haven't reappeared the comparable J to score reported in his paper until now, hoping you can solve it!

Hi @lscelory . Use deeplab_davis_12_0.pth as pretrained model and fine tune the COSNet, I just got J=77.7%(sampe-range=2, use-crf and use DAVIS16 official benchmark code). About 2% lower than the 79.7% in the paper. Could you tell me the real J_mean of your unreliable 81.98%? In addition, I couldn't see any different when change the sampe-range from 2 to 5 and switch prediction fusion to attention summary fusion. My e-mail address is 1398714115@qq.com and 20181223071@nuist.edu.cn, looking forward to your reply.

lscelory commented 4 years ago

hi, @looong96 : Sorry for replying so late. For the first question, you are right. I re-test my results by DAVIS16 official benchmark code and I got J=77.5%. This value is near 77.6% in paper. I implement the experiment just followed the author's default settings. (sample-range=1, which means just one reference frame). I think this value is consistent with paper. The 81.98% I got before is caused by some bugs in my own version testing code and you can ignore it. For the second questions, I also found the sample-range parameter doesn't work. The testing code provided by the author implement the fusion way by adding the segmentation results directly, which called Prediction segmentation fusion in paper.(Table1,79.5). However, we don't get the corresponding value. I used to try re-implement testing code in Attention summary fusion way, but I don't see the improvement neither. By the way, I send this answer link to your email, you can reply me just use Chinese, that will be more efficient.

kelisiya commented 4 years ago

Did you use any data augmentation?

Just same with the official code in the dataloader.py file, which including filp, rescale and crop.

Regarding the rescale and crop of data augmentation, is it correct to use transforms.RandomResizedCrop(size = (473,473),scale=[0.5, 0.8, 1])?

Starboy-at-earth commented 4 years ago

Hi lscelory: Could you please tell me your used GPU capacity? I have two RTX 2080Tis (each with 11G ram). The batch size must be set to 4 (16 in the paper) and the input size (473 by 473 in the paper) of the reference frame and query frame must be resized to at most 378 by 378 (Otherwise the released code will notify me the running time error of "out of the memory".). However, the claimed GPU in this paper is a NVIDIA Titan XP (with 12G ram comparable to each of my ownings). Could you please tell me what is wrong with my running? My email is 2667004002@qq.com and your reply is desired. Thank you in advance!!!

lih627 commented 3 years ago

Following the naming format of deeplab_davis_12_0.pth, I pre-trained deeplabv3 only on davis16 dataset.(with batch_size=8, learning_rate=0.001, maxEpoches=120, weight_decay=0.0005) And I got J=72.05% on davis16 test set. Then I initialized the deeplabv3 part of COSNet by my pretrained model and fune tune the whole COSNet on davis16 and saliency dataset using your training code, I only got J=68.26%. It seems like saliency data decrease the model performance on davis data.

On the other hand, I pre-trained deeplabv3 only on saliency dataset(MSRA10K and DUT), (with batch_size=10, learning_rate=0.001, maxEpoches=20, weight_decay=0.0001). I got best J=74.55% when using model trained on saliency data and tested on davis16 test set. Then I use the model for initialization to fune tune COSNet, I only got J=77.33%.

Loading your deeplab_davis_12_0.pth as pretrained model and fine tune the COSNet, I got J=81.98%. It is much higher than my 77.33% result.

Could you tell me how you get your pretrained model? It is so important for my current work. Thanks again!

Best, shichao

I want to know how to download DUT dataset.

I searched 2 DUT dataset:

could you please tell me which dataset to download?

thanks.

Starboy-at-earth commented 3 years ago

It is the second you should download. The first has nothing to do with our mentioned DUTS, a difference that is similar to "象牙" and "象牙塔". Could you please give me your qq number? 

------------------ Original message ------------------ From: "lihao627"; Sendtime: Sunday, Feb 21, 2021 10:22 PM To: "carrierlxk/COSNet"; Cc: "遁去的一"2667004002@qq.com; "Comment"; Subject: Re: [carrierlxk/COSNet] How to train your deeplab pretrained model? (#14)

Following the naming format of deeplab_davis_12_0.pth, I pre-trained deeplabv3 only on davis16 dataset.(with batch_size=8, learning_rate=0.001, maxEpoches=120, weight_decay=0.0005) And I got J=72.05% on davis16 test set. Then I initialized the deeplabv3 part of COSNet by my pretrained model and fune tune the whole COSNet on davis16 and saliency dataset using your training code, I only got J=68.26%. It seems like saliency data decrease the model performance on davis data.

On the other hand, I pre-trained deeplabv3 only on saliency dataset(MSRA10K and DUT), (with batch_size=10, learning_rate=0.001, maxEpoches=20, weight_decay=0.0001). I got best J=74.55% when using model trained on saliency data and tested on davis16 test set. Then I use the model for initialization to fune tune COSNet, I only got J=77.33%.

Loading your deeplab_davis_12_0.pth as pretrained model and fine tune the COSNet, I got J=81.98%. It is much higher than my 77.33% result.

Could you tell me how you get your pretrained model? It is so important for my current work. Thanks again!

Best, shichao

I want to know how to download DUT dataset.

I searched 2 DUT dataset:

DUT-OMRON Dataset Images http://saliencydetection.net/dut-omron/#org96c3bab

DUTs dataset: http://saliencydetection.net/duts/

could you please tell me which dataset to download?

thanks.

— You are receiving this because you commented. Reply to this email directly, view it on GitHub, or unsubscribe.

lscelory commented 3 years ago

Hi lscelory: Could you please tell me your used GPU capacity? I have two RTX 2080Tis (each with 11G ram). The batch size must be set to 4 (16 in the paper) and the input size (473 by 473 in the paper) of the reference frame and query frame must be resized to at most 378 by 378 (Otherwise the released code will notify me the running time error of "out of the memory".). However, the claimed GPU in this paper is a NVIDIA Titan XP (with 12G ram comparable to each of my ownings). Could you please tell me what is wrong with my running? My email is 2667004002@qq.com and your reply is desired. Thank you in advance!!!

Yes, u r right. I use 4 GTX 1080Ti s to train my model. Each device has 11G RAM. And I can only set batch size=8 under input size=(512, 512). I don't know how the author train his model with batch size=16. Maybe that is the key reason about my reappear result gap compared to the paper.