how to obtain the same performance as the given pretrained model

gallenszl / CFNet

CFNet: Cascade and Fused Cost Volume for Robust Stereo Matching（CVPR2021）

MIT License

155 stars 23 forks source link

how to obtain the same performance as the given pretrained model #27

Open xzy-yqm opened 2 years ago

xzy-yqm commented 2 years ago

I tried to train the cf-net model using the code from the github and just replace the Mish activation function to Relu for the first 20 epoches and then back to Mish for another 15 epochs just as the paper described. But the performance of the trained model is far from that by the pretrained model given in the gitlab. So what's wrong with my training ? is there any parameter that shoud be modified? I used ./scripts/sceneflow.sh on two V100 GPUs

gallenszl commented 2 years ago

Hello, maybe you can check out the following details: (1) the learning rate of the prolonged 15 epochs with mish is started at 0.001 and downscaled by 10 after epoch 10. (2) make sure you replace all the activation functions when employing switch training strategy especially the activation function in submodule.py.

xzy-yqm commented 2 years ago

I have double check the details，but it seems still not to be able to obtain the similar performance. I wonder if there is any difference bw the code in the github and that you used when training. That's wired.

gallenszl commented 2 years ago

Hello, could you tell me the results you trained by yourself?

xzy-yqm commented 2 years ago

the EPE of the scene flow test dataset is 33.3 after trained by mish activation function.

gallenszl commented 2 years ago

Hello, this is a very strange result. Could you tell me the result before mish?

gallenszl commented 2 years ago

And did you finish thre whole training process, i.e, 15 epochs for mish?

xzy-yqm commented 2 years ago

yes, of course I did the whole 15 epochs. And I tried to prolong the mish training for another 10 epoch, which is 25 epochs in total. the best EPE is about 20 before mish. I did observed the jump of the loss and the test EPE when applied the switch stragety.

gallenszl commented 2 years ago

The EPE is 20? Does it mean the result even not convergence and you can not get a resonable result？

xzy-yqm commented 2 years ago

for the train EPE and loss，it seems convergence but for the test EPE and loss, the reslut seems not. That's wired. the main difference bw the test batch and train batch is the size of batch, but the convergence is quite different.

august779188 commented 6 months ago

hi,could you tell me what args should be selected in the pretraining process,"resume or loadckpt". do i need to load the state of the optimizers in the relu stage?thank you