antabangun / coex

GNU General Public License v3.0
142 stars 19 forks source link

test #4

Closed TITAINc closed 2 years ago

TITAINc commented 3 years ago

Hello, sorry to bother you again. Your algorithm impressed me deeply. I try to understand and learn deeply. Please allow me to ask you some questions. How to test my own data set? Does the test require camera calibration parameters similar to those in kittiraw?

antabangun commented 3 years ago

Hello, if I understand the question correctly, I take it you want to run the model on your own images. If so, you can try doing something like this (I assume you don't use cuda):

import cv2
import numpy as np

import torch
import torchvision.transforms as transforms

from ruamel.yaml import YAML

from stereo import Stereo

torch.backends.cudnn.benchmark = True

torch.set_grad_enabled(False)

config = 'cfg_coex.yaml'
version = 0  # CoEx
half_precision = True

##########################
# Load the model here using the cfg file and checkpoint in this repo, 
# you can follow 'demo.py' for this:
def load_configs(path):
    cfg = YAML().load(open(path, 'r'))
    backbone_cfg = YAML().load(
        open(cfg['model']['stereo']['backbone']['cfg_path'], 'r'))
    cfg['model']['stereo']['backbone'].update(backbone_cfg)
    return cfg

cfg = load_configs(
    './configs/stereo/{}'.format(config))

ckpt = '{}/{}/version_{}/checkpoints/last.ckpt'.format(
    'logs/stereo', cfg['model']['name'], version)
cfg['stereo_ckpt'] = ckpt
stereo = Stereo.load_from_checkpoint(cfg['stereo_ckpt'],
                                                strict=False,
                                                cfg=cfg)
##########################

##########################
# Then instead of using the dataloader used in demo.py
# load your own image.

# This is just an example:
imgL, imgR = cv2.imread(left_path), cv2.imread(right_path)  # 3 channels image
H, W, _ = imgL.shape

## Normalize the image and convert to torch tensor
__imagenet_stats = {'mean': [0.485, 0.456, 0.406],
                    'std': [0.229, 0.224, 0.225]}
preprocess = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(**__imagenet_stats),
    ])
imgL, imgR = preprocess(imgL), preprocess(imgR)
imgL, imgR = imgL.reshape(1, 3, H, W), imgR.reshape(1, 3, H, W)

##########################
# Then pass them into the model:
with torch.no_grad():
    with torch.cuda.amp.autocast(enabled=half_precision):
        disp = stereo(imgL, imgR, False)

P.S. I didn't try running this code, so there might be some error

About the other question, the model was only tuned in kitti dataset. So it might be tuned towards driving scenarios with the kitti sensor configuration, so it might not work as well in other images. But based on my experience, it can still give out a decent output, even though not the greatest. If the output is not good, I can suggest some things that you can try:

1) Tune the model on a combination of multiple stereo datasets (e.g., kitti, sceneflow, driving stereo, Middlebury, eth3d), then hopefully the model can generalize to different configs. 2) Tune it on your own training set, if you have ground truth disparity/depth (but it is hard to acquire). 3) Try self-supervised training on your own images, you can refer to something like PVStereo. In my experience you can improve the output a lot just by tuning with a few self supervised training iterations.

Hopefully I answered your question well.

TITAINc commented 3 years ago

Thank you for your guidance. I will try your method. Before that, I have a question. I want to present an effect that does not display depth information beyond 3 meters. In the depth learning algorithm, I try to adjust max_disparity The value of discrimination, but it doesn't work. Do you have any good ideas?

antabangun commented 3 years ago

Hmm, sorry I couldn't understand your question, could you elaborate a little bit more.

jucic commented 2 years ago

Hello, if I understand the question correctly, I take it you want to run the model on your own images. If so, you can try doing something like this (I assume you don't use cuda):

import cv2
import numpy as np

import torch
import torchvision.transforms as transforms

from ruamel.yaml import YAML

from stereo import Stereo

torch.backends.cudnn.benchmark = True

torch.set_grad_enabled(False)

config = 'cfg_coex.yaml'
version = 0  # CoEx
half_precision = True

##########################
# Load the model here using the cfg file and checkpoint in this repo, 
# you can follow 'demo.py' for this:
def load_configs(path):
    cfg = YAML().load(open(path, 'r'))
    backbone_cfg = YAML().load(
        open(cfg['model']['stereo']['backbone']['cfg_path'], 'r'))
    cfg['model']['stereo']['backbone'].update(backbone_cfg)
    return cfg

cfg = load_configs(
    './configs/stereo/{}'.format(config))

ckpt = '{}/{}/version_{}/checkpoints/last.ckpt'.format(
    'logs/stereo', cfg['model']['name'], version)
cfg['stereo_ckpt'] = ckpt
stereo = Stereo.load_from_checkpoint(cfg['stereo_ckpt'],
                                                strict=False,
                                                cfg=cfg)
##########################

##########################
# Then instead of using the dataloader used in demo.py
# load your own image.

# This is just an example:
imgL, imgR = cv2.imread(left_path), cv2.imread(right_path)  # 3 channels image
H, W, _ = imgL.shape

## Normalize the image and convert to torch tensor
__imagenet_stats = {'mean': [0.485, 0.456, 0.406],
                    'std': [0.229, 0.224, 0.225]}
preprocess = transforms.Compose([
        transforms.ToTensor(),
        transforms.Normalize(**__imagenet_stats),
    ])
imgL, imgR = preprocess(imgL), preprocess(imgR)
imgL, imgR = imgL.reshape(1, 3, H, W), imgR.reshape(1, 3, H, W)

##########################
# Then pass them into the model:
with torch.no_grad():
    with torch.cuda.amp.autocast(enabled=half_precision):
        disp = stereo(imgL, imgR, False)

P.S. I didn't try running this code, so there might be some error

About the other question, the model was only tuned in kitti dataset. So it might be tuned towards driving scenarios with the kitti sensor configuration, so it might not work as well in other images. But based on my experience, it can still give out a decent output, even though not the greatest. If the output is not good, I can suggest some things that you can try:

  1. Tune the model on a combination of multiple stereo datasets (e.g., kitti, sceneflow, driving stereo, Middlebury, eth3d), then hopefully the model can generalize to different configs.
  2. Tune it on your own training set, if you have ground truth disparity/depth (but it is hard to acquire).
  3. Try self-supervised training on your own images, you can refer to something like PVStereo. In my experience you can improve the output a lot just by tuning with a few self supervised training iterations.

Hopefully I answered your question well.

thank you for your wonderful job, you mentioned PVStereo here, however, it seems that PVStereo has not been open source so far, so do you know how to make use of PVStereo's strategy to improve, thanks in advance.