Some questions about training details.

Hi: After reading your amazing work, I first downloaded your shared trained model parameters and use it to test on DAVIS16 test set.I use python version metrics on segmentation task and set the class number=2, J is foreground miou, is that right? I got J=82.47%, this result is sightly higher than that in your paper. Then I tried to replicate your training process. Following your training settings in paper, I only got J=71.44% with co-attention module(whole COSNet) and J=70.78% without co-attention module(format as Res Deeplab in your shared code). I don't know what exactly happened during training process. Hoping you can give me some suggestion. Here is my settings(most of them are same as yours):

Pretrianed Res Deeplab on MSRA10K with batchsize=8 and 60000 iterations. learning rate is 0.00025. (This pretrained model is useful indeed, without using it the performance is drop to 55.67% on DAVIS16 test set.)
Data preprocessing: I just use your shared code 'PairwiseImg_test.py' for training. I noticed your dataset code do not convert the sample output to Pytorch Tensor explicitly, but realize it by default method of Pytorch Dataloader method.
Training process: I use SGD optimizer and initial lr=0.00025 in your paper and layers' parameters after Res Deeplab layer4 is updated by 10x lr. For the loss I use the 2 classes nn.CrossEntropyLoss in Pytorch. I didn't use nn.BCEloss because the 'weight' augument in nn.BCEloss is used for weighted the each batch of input tensor, it is not consistent with Eq.11 in your paper. The training epoch is 100 as well.

Following settings above, I only got J=71.44% by COSNet. It is just near the performance 71.3% you reported in paper achieve by Deeplabv3. I wonder which part of my training was wrong, my co-attention module did not seem to work. I also found during my training process the loss is fluctuated a little bit sharply, and J/miou did not impove with epoch.(miou in fig is foreground background miou.) This work is so impressive! And I'm looking forward your reply. Thanks in advance.

Dear Lscelory,

Thanks for your interest in my work. Several tips should be noticed to improve the performance: 1. Data augmentation techniques, like scale, flipping and crop operation is very important. 2. The weighted BCE loss has an impact on the final performance. 3. Learning rates for backbone network and the new add layers are different (1:10 ratio) and the learning rate strategy comes from deeplabv3++. In addition, my loss will converge to a very small value (your loss value is much higher). Hope these suggestions help you.

Best, Xiankai Lu

lscelory notifications@github.com 于2019年10月24日周四下午12:28写道：

Hi: After reading your amazing work, I first downloaded your shared trained model parameters and use it to test on DAVIS16 test set.I use python version metrics on segmentation task and set the class number=2, J is foreground miou, is that right? I got J=82.47%, this result is sightly higher than that in your paper. Then I tried to replicate your training process. Following your training settings in paper, I only got J=71.44% with co-attention module(whole COSNet) and J=70.78% without co-attention module(format as Res Deeplab in your shared code). I don't know what exactly happened during training process. Hoping you can give me some suggestion. Here is my settings(most of them are same as yours):

Pretrianed Res Deeplab on MSRA10K with batchsize=8 and 60000 iterations. learning rate is 0.00025. (This pretrained model is useful indeed, without using it the performance is drop to 55.67% on DAVIS16 test set.)

Data preprocessing: I just use your shared code 'PairwiseImg_test.py' for training. I noticed your dataset code do not convert the sample output to Pytorch Tensor explicitly, but realize it by default method of Pytorch Dataloader method. -Training process: I use SGD optimizer and initial lr=0.00025 in your paper and layers' parameters after Res Deeplab layer4 is updated by 10x lr. For the loss I use the 2 classes nn.CrossEntropyLoss in Pytorch. I didn't use nn.BCEloss because the 'weight' augument in nn.BCEloss is used for weighted the each batch of input tensor, it is not consistent with Eq.11 in your paper. The training epoch is 100 as well. Following settings above, I only got J=71.44% by COSNet. It is just near the performance 71.3% you reported in paper achieve by Deeplabv3. I wonder which part of my training was wrong, my co-attention module did not seem to work. I also found during my training process the loss is fluctuated a little bit sharply, and J/miou did not impove with epoch.(miou in fig is foreground background miou.) [image: image] https://user-images.githubusercontent.com/35630795/67466812-48d6f200-f67a-11e9-8d3c-40d31226f0ee.png This work is so impressive! And I'm looking forward your reply. Thanks in advance.

— You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub https://github.com/carrierlxk/COSNet/issues/8?email_source=notifications&email_token=ABNY6ZXMNNGHWAISSOOBJNDQQFMBFA5CNFSM4JEQS2UKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUBI5DA, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNY6ZTPM4RGHUUPMQN24OTQQFMBFANCNFSM4JEQS2UA .

Dear Lscelory, Thanks for your interest in my work. Several tips should be noticed to improve the performance: 1. Data augmentation techniques, like scale, flipping and crop operation is very important. 2. The weighted BCE loss has an impact on the final performance. 3. Learning rates for backbone network and the new add layers are different (1:10 ratio) and the learning rate strategy comes from deeplabv3++. In addition, my loss will converge to a very small value (your loss value is much higher). Hope these suggestions help you. Best, Xiankai Lu lscelory notifications@github.com 于2019年10月24日周四下午12:28写道： … Hi: After reading your amazing work, I first downloaded your shared trained model parameters and use it to test on DAVIS16 test set.I use python version metrics on segmentation task and set the class number=2, J is foreground miou, is that right? I got J=82.47%, this result is sightly higher than that in your paper. Then I tried to replicate your training process. Following your training settings in paper, I only got J=71.44% with co-attention module(whole COSNet) and J=70.78% without co-attention module(format as Res Deeplab in your shared code). I don't know what exactly happened during training process. Hoping you can give me some suggestion. Here is my settings(most of them are same as yours): - Pretrianed Res Deeplab on MSRA10K with batchsize=8 and 60000 iterations. learning rate is 0.00025. (This pretrained model is useful indeed, without using it the performance is drop to 55.67% on DAVIS16 test set.) - Data preprocessing: I just use your shared code 'PairwiseImg_test.py' for training. I noticed your dataset code do not convert the sample output to Pytorch Tensor explicitly, but realize it by default method of Pytorch Dataloader method. -Training process: I use SGD optimizer and initial lr=0.00025 in your paper and layers' parameters after Res Deeplab layer4 is updated by 10x lr. For the loss I use the 2 classes nn.CrossEntropyLoss in Pytorch. I didn't use nn.BCEloss because the 'weight' augument in nn.BCEloss is used for weighted the each batch of input tensor, it is not consistent with Eq.11 in your paper. The training epoch is 100 as well. Following settings above, I only got J=71.44% by COSNet. It is just near the performance 71.3% you reported in paper achieve by Deeplabv3. I wonder which part of my training was wrong, my co-attention module did not seem to work. I also found during my training process the loss is fluctuated a little bit sharply, and J/miou did not impove with epoch.(miou in fig is foreground background miou.) [image: image] https://user-images.githubusercontent.com/35630795/67466812-48d6f200-f67a-11e9-8d3c-40d31226f0ee.png This work is so impressive! And I'm looking forward your reply. Thanks in advance. — You are receiving this because you are subscribed to this thread. Reply to this email directly, view it on GitHub <#8?email_source=notifications&email_token=ABNY6ZXMNNGHWAISSOOBJNDQQFMBFA5CNFSM4JEQS2UKYY3PNVWWK3TUL52HS4DFUVEXG43VMWVGG33NNVSW45C7NFSM4HUBI5DA>, or unsubscribe https://github.com/notifications/unsubscribe-auth/ABNY6ZTPM4RGHUUPMQN24OTQQFMBFANCNFSM4JEQS2UA .

Thanks @carrierlxk for replying me so quickly. You are so kind and patient. I implement data augmentation by your code here. And I compute the foreground-background pixel number ratio over training set and wrote my own weighted BCE loss. I'm training my new version model thanks to your advice. Hope it works. Thanks again for your nice work and your patient answer. I wish we could have more communication in the further. And also expecting your training code release.

Best wishes, lsc

Hello, Lu: Sorry to bother you again. I have some new questions about the training details. I summarized them and email to you. I guess you are busy for CVPR ddl and have no time check your gmail. Hoping your reply when you are free. Thanks so much!

Best

@lscelory , hi, I want to know what is your final loss of convergence.

@Castile After the author shared his training code, I used it to train my model. I finally got the converged loss which value is ~0.5 for co-attention module. And following the alternately training strategy the loss of deeplabv3 backbone converge to ~0.02

carrierlxk / COSNet

Some questions about training details. #8