Closed TITAINc closed 2 years ago
Hello, if I understand the question correctly, I take it you want to run the model on your own images. If so, you can try doing something like this (I assume you don't use cuda):
import cv2
import numpy as np
import torch
import torchvision.transforms as transforms
from ruamel.yaml import YAML
from stereo import Stereo
torch.backends.cudnn.benchmark = True
torch.set_grad_enabled(False)
config = 'cfg_coex.yaml'
version = 0 # CoEx
half_precision = True
##########################
# Load the model here using the cfg file and checkpoint in this repo,
# you can follow 'demo.py' for this:
def load_configs(path):
cfg = YAML().load(open(path, 'r'))
backbone_cfg = YAML().load(
open(cfg['model']['stereo']['backbone']['cfg_path'], 'r'))
cfg['model']['stereo']['backbone'].update(backbone_cfg)
return cfg
cfg = load_configs(
'./configs/stereo/{}'.format(config))
ckpt = '{}/{}/version_{}/checkpoints/last.ckpt'.format(
'logs/stereo', cfg['model']['name'], version)
cfg['stereo_ckpt'] = ckpt
stereo = Stereo.load_from_checkpoint(cfg['stereo_ckpt'],
strict=False,
cfg=cfg)
##########################
##########################
# Then instead of using the dataloader used in demo.py
# load your own image.
# This is just an example:
imgL, imgR = cv2.imread(left_path), cv2.imread(right_path) # 3 channels image
H, W, _ = imgL.shape
## Normalize the image and convert to torch tensor
__imagenet_stats = {'mean': [0.485, 0.456, 0.406],
'std': [0.229, 0.224, 0.225]}
preprocess = transforms.Compose([
transforms.ToTensor(),
transforms.Normalize(**__imagenet_stats),
])
imgL, imgR = preprocess(imgL), preprocess(imgR)
imgL, imgR = imgL.reshape(1, 3, H, W), imgR.reshape(1, 3, H, W)
##########################
# Then pass them into the model:
with torch.no_grad():
with torch.cuda.amp.autocast(enabled=half_precision):
disp = stereo(imgL, imgR, False)
P.S. I didn't try running this code, so there might be some error
About the other question, the model was only tuned in kitti dataset. So it might be tuned towards driving scenarios with the kitti sensor configuration, so it might not work as well in other images. But based on my experience, it can still give out a decent output, even though not the greatest. If the output is not good, I can suggest some things that you can try:
1) Tune the model on a combination of multiple stereo datasets (e.g., kitti, sceneflow, driving stereo, Middlebury, eth3d), then hopefully the model can generalize to different configs. 2) Tune it on your own training set, if you have ground truth disparity/depth (but it is hard to acquire). 3) Try self-supervised training on your own images, you can refer to something like PVStereo. In my experience you can improve the output a lot just by tuning with a few self supervised training iterations.
Hopefully I answered your question well.
Thank you for your guidance. I will try your method. Before that, I have a question. I want to present an effect that does not display depth information beyond 3 meters. In the depth learning algorithm, I try to adjust max_disparity The value of discrimination, but it doesn't work. Do you have any good ideas?
Hmm, sorry I couldn't understand your question, could you elaborate a little bit more.
Hello, if I understand the question correctly, I take it you want to run the model on your own images. If so, you can try doing something like this (I assume you don't use cuda):
import cv2 import numpy as np import torch import torchvision.transforms as transforms from ruamel.yaml import YAML from stereo import Stereo torch.backends.cudnn.benchmark = True torch.set_grad_enabled(False) config = 'cfg_coex.yaml' version = 0 # CoEx half_precision = True ########################## # Load the model here using the cfg file and checkpoint in this repo, # you can follow 'demo.py' for this: def load_configs(path): cfg = YAML().load(open(path, 'r')) backbone_cfg = YAML().load( open(cfg['model']['stereo']['backbone']['cfg_path'], 'r')) cfg['model']['stereo']['backbone'].update(backbone_cfg) return cfg cfg = load_configs( './configs/stereo/{}'.format(config)) ckpt = '{}/{}/version_{}/checkpoints/last.ckpt'.format( 'logs/stereo', cfg['model']['name'], version) cfg['stereo_ckpt'] = ckpt stereo = Stereo.load_from_checkpoint(cfg['stereo_ckpt'], strict=False, cfg=cfg) ########################## ########################## # Then instead of using the dataloader used in demo.py # load your own image. # This is just an example: imgL, imgR = cv2.imread(left_path), cv2.imread(right_path) # 3 channels image H, W, _ = imgL.shape ## Normalize the image and convert to torch tensor __imagenet_stats = {'mean': [0.485, 0.456, 0.406], 'std': [0.229, 0.224, 0.225]} preprocess = transforms.Compose([ transforms.ToTensor(), transforms.Normalize(**__imagenet_stats), ]) imgL, imgR = preprocess(imgL), preprocess(imgR) imgL, imgR = imgL.reshape(1, 3, H, W), imgR.reshape(1, 3, H, W) ########################## # Then pass them into the model: with torch.no_grad(): with torch.cuda.amp.autocast(enabled=half_precision): disp = stereo(imgL, imgR, False)
P.S. I didn't try running this code, so there might be some error
About the other question, the model was only tuned in kitti dataset. So it might be tuned towards driving scenarios with the kitti sensor configuration, so it might not work as well in other images. But based on my experience, it can still give out a decent output, even though not the greatest. If the output is not good, I can suggest some things that you can try:
- Tune the model on a combination of multiple stereo datasets (e.g., kitti, sceneflow, driving stereo, Middlebury, eth3d), then hopefully the model can generalize to different configs.
- Tune it on your own training set, if you have ground truth disparity/depth (but it is hard to acquire).
- Try self-supervised training on your own images, you can refer to something like PVStereo. In my experience you can improve the output a lot just by tuning with a few self supervised training iterations.
Hopefully I answered your question well.
thank you for your wonderful job, you mentioned PVStereo here, however, it seems that PVStereo has not been open source so far, so do you know how to make use of PVStereo's strategy to improve, thanks in advance.
Hello, sorry to bother you again. Your algorithm impressed me deeply. I try to understand and learn deeply. Please allow me to ask you some questions. How to test my own data set? Does the test require camera calibration parameters similar to those in kittiraw?