aeolusguan / NMRF

[CVPR 2024] Neural Markov Random Field for Stereo Matching
MIT License
101 stars 5 forks source link

gray images inference #6

Open pullitdown opened 2 days ago

pullitdown commented 2 days ago

Hi! I'm really fascinated by the zero-shot ability discussed in this paper. I'm trying to use my gray stereo camera to predict real-world disparity. However, when I fine-tuned the pre-trained model with grayscale images from the KITTI dataset for about 300 epochs, the results seem to get worse. Should I use the entire dataset for fine-tuning, or is there a better approach you could suggest? For image processing, I read the image as a grayscale image and stack three identical grayscale channels as the model input.

aeolusguan commented 2 days ago

Which model did you use? The original or that with SwinT backbone. I have found that the model with SwinT backbone does not generalize well to gray image.

pullitdown commented 2 days ago

i use original one. the results are as follows Screenshot from 2024-10-18 10-56-40 the results seem that the pretrained one better than fine-tuned model.

aeolusguan commented 2 days ago

It seems that your test image is indoor. The KITTI dataset is for autonomous driving scenes. These two domains are very different. I encourage the indoor datasets, like InStereo2K, to be used for finetuning in your application. Good luck.

pullitdown commented 2 days ago

Thank you very much, I will test with a new dataset, and then post the results back to the reply.