Open e-k-m opened 4 years ago
Hi, @e-k-m For 2., I would like to try some refinement methods, such as guided filtering or dense CRF. Retraining on your own dataset can always give you more precise results.
For 3, larger size of images always mean that there exist larger disparity values. When training or testing on large images, the maximum disparity value should be concerned.
For 4, "basic" model is a solution here. In #155 , I suggested that you can skip latest one or two "hourglass" (simply comment the forward code to achieve intermediate result).
@JiaRenChang thank you a lot for the feedback, -- very informative. Will look into it. Have a great day.
Thank you for making this code publicly available. Very cool. I have been comparing the resulting disparity maps using PSMNet with the pretrained KITTI 2015 model, with results using OpenCV Semi Global Block Matcher using restrictive parameters for precise but not dense disparity maps. In addition images of size 2736 x 1542 (w/h) are used as input for the OpenCV SGMB matcher, while images of size 1242 x (w/h) are used as input
for the PSMNet matcher. Given this I have the following remarks and questions:
The results are generally very good concerning the density of estimated disparity values, compared to OpenCV SGMB (see e.g. Figure 1)
It generally seems that the PSMNet results estimate a larger foreground for fine objects than actually seeming to be correct. As an example in Figure 1. the direction signs pool is estimated rather coarsely compared to OpenCVs SGMB? Would better results be achieved if I would retrained and estimate on high resolution images e.g. 2736 x 1542?
If I use input images of size 2736 x 1542 as input to PSMNet I get very poor matching results. This seems logical. If I retrain on 2736 x 1542 can I expect the results to be better than currently for 1242 x?
Currently, I can not process images of size 2736 x 1542 on the CPU, since > 80 GB of memory would need to be allocated. This is fine, since GPU works. But what network optimization / adaptions could be applied to reduce memory usage on CPU / in general, while precision could be reduced to a small amount? Would running the "basic" model instead of "stackhourglass" do the trick? I also think https://github.com/JiaRenChang/PSMNet/issues/155 applies here.
Figure 1: Left stereo image, OpenCV SGMB result, PSMNet result.
Further images can be found here: https://varstuff.s3-eu-west-1.amazonaws.com/psmnet/viewer.html