Open pullitdown opened 2 days ago
Which model did you use? The original or that with SwinT backbone. I have found that the model with SwinT backbone does not generalize well to gray image.
i use original one. the results are as follows the results seem that the pretrained one better than fine-tuned model.
It seems that your test image is indoor. The KITTI dataset is for autonomous driving scenes. These two domains are very different. I encourage the indoor datasets, like InStereo2K, to be used for finetuning in your application. Good luck.
Thank you very much, I will test with a new dataset, and then post the results back to the reply.
Hi! I'm really fascinated by the zero-shot ability discussed in this paper. I'm trying to use my gray stereo camera to predict real-world disparity. However, when I fine-tuned the pre-trained model with grayscale images from the KITTI dataset for about 300 epochs, the results seem to get worse. Should I use the entire dataset for fine-tuning, or is there a better approach you could suggest? For image processing, I read the image as a grayscale image and stack three identical grayscale channels as the model input.