Closed ASONG0506 closed 4 years ago
Hi @ASONG0506 , The sky area is easy to over-fitting, but I only saw rarely several times. As for the other part, I think there must be something wrong. Could you please provide the wholistic training process and training environment you had set? e.g., config file, epoch, GPUS and so on. More details help a lot
Thanks for your reply, I will re-train the model with all the parameters the same as yours.
Hi, as is shown in this figure, there are still some bad-predicted areas, especially for the sky area. My training environment is as below:
apex 0.1
cffi 1.14.0
cupy 7.3.0
Cython 0.29.16
decorator 4.4.2
dmb 1.0 DenseMatchingBenchmark
easydict 1.9
GANet 0.0.0 DenseMatchingBenchmark/dmb/ops/libGANet
gaterecurrent2dnoind-cuda 0.0.0 DenseMatchingBenchmark/dmb/ops/spn
ipython 7.9.0
matplotlib 3.0.3
mmcv 0.4.3
numpy 1.18.2
opencv-python 4.2.0.34
pandas 0.24.2
Pillow 6.2.2
pip 20.0.2
PyWavelets 1.1.1
PyYAML 5.3.1
scikit-image 0.15.0
scipy 1.4.1
setuptools 46.1.3
six 1.14.0
tensorboard 1.14.0
tensorboardX 2.0
tensorflow-estimator 1.14.0
tensorflow-gpu 1.14.0
termcolor 1.1.0
thop 0.0.31.post2004070130
torch 1.1.0
torchvision 0.2.1
tqdm 4.45.0
The dataset I used was kitti-stereo-2015, and the model was trained from scratch using distributed-training with 2 TITAN X GPUs. I used the kitti_uniform.py
setting file with some modification as below,
import os.path as osp
# model settings
max_disp = 192
model = dict(
meta_architecture="GeneralizedStereoModel",
max_disp=max_disp, # max disparity
batch_norm=True, # the model whether or not to use BatchNorm
backbone=dict(
type="PSMNet",
in_planes=3, # the in planes of feature extraction backbone
),
cost_processor=dict(
# Use the concatenation of left and right feature to form cost volume, then aggregation
type='CAT',
cost_computation=dict(
# default cat_fms
type="default",
# the maximum disparity of disparity search range under the resolution of feature
max_disp=int(max_disp // 4),
# the start disparity of disparity search range
start_disp=0,
# the step between near disparity sample
dilation=1,
),
cost_aggregator=dict(
type="ACF",
# the maximum disparity of disparity search range
max_disp=max_disp,
# the in planes of cost aggregation sub network
in_planes=64,
),
),
disp_predictor=dict(
# default FasterSoftArgmin
type="FASTER",
# the maximum disparity of disparity search range
max_disp=max_disp,
# the start disparity of disparity search range
start_disp=0,
# the step between near disparity sample
dilation=1,
# the temperature coefficient of soft argmin
alpha=1.0,
# whether normalize the estimated cost volume
normalize=True,
),
losses=dict(
focal_loss=dict(
# the maximum disparity of disparity search range
max_disp=max_disp,
# the start disparity of disparity search range
start_disp=0,
# the step between near disparity sample
dilation=1,
# weight for stereo focal loss with regard to other loss type
weight=1.0,
# weights for different scale loss
weights=(1.0, 0.7, 0.5),
# stereo focal loss focal coefficient
coefficient=5.0,
# the variance of uni-modal distribution
variance=1.2, # if not given, the variance will be estimated by network
),
l1_loss=dict(
# the maximum disparity of disparity search range
max_disp=max_disp,
# weight for l1_loss with regard to other loss type
weight=0.1,
# weights for different scale loss
weights=(1.0, 0.7, 0.5),
),
),
eval=dict(
# evaluate the disparity map within (lower_bound, upper_bound)
lower_bound=0,
upper_bound=max_disp,
# evaluate the disparity map in occlusion area and not occlusion
eval_occlusion=True,
# return the cost volume after regularization for visualization
is_cost_return=False,
# whether move the cost volume from cuda to cpu
is_cost_to_cpu=True,
),
)
# dataset settings
dataset_type = 'KITTI-2015'
# data_root = 'datasets/{}/'.format(dataset_type)
# annfile_root = osp.join(data_root, 'annotations')
# root = '/home/youmin/'
#root = '/node01/jobs/io/out/youmin/'
root= "/home/xxx/env_acfnet/DenseMatchingBenchmark"
data_root = osp.join(root, 'data/StereoMatching/', dataset_type)
annfile_root = osp.join(root, 'data/annotations/', dataset_type)
# If you don't want to visualize the results, just uncomment the vis data
# For download and usage in debug, please refer to DATA.md and GETTING_STATED.md respectively.
vis_data_root = osp.join(root, 'data/visualization_data/', dataset_type)
vis_annfile_root = osp.join(vis_data_root, 'annotations')
data = dict(
# if disparity of datasets is sparse, e.g., SceneFLow is not sparse, but KITTI is sparse
sparse=True,
imgs_per_gpu=2,
workers_per_gpu=16,
train=dict(
type=dataset_type,
data_root=data_root,
annfile=osp.join(annfile_root, 'full_train.json'),
input_shape=[256, 512],
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
use_right_disp=False,
),
eval=dict(
type=dataset_type,
data_root=data_root,
annfile=osp.join(annfile_root, 'full_eval.json'),
input_shape=[384, 1248],
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
use_right_disp=False,
),
# If you don't want to visualize the results, just uncomment the vis data
vis=dict(
type=dataset_type,
data_root=vis_data_root,
annfile=osp.join(vis_annfile_root, 'vis_test.json'),
input_shape=[384, 1248],
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
),
test=dict(
type=dataset_type,
data_root=data_root,
annfile=osp.join(annfile_root, 'full_test.json'),
input_shape=[384, 1248],
mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225],
use_right_disp=False,
),
)
optimizer = dict(type='RMSprop', lr=0.001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))
lr_config = dict(
policy='step',
warmup='constant',
warmup_iters=100,
warmup_ratio=1.0,
gamma=1/3,
step=[100, 300, 600]
)
checkpoint_config = dict(
interval=25
)
log_config = dict(
interval=5,
hooks=[
dict(type='TextLoggerHook'),
dict(type='TensorboardLoggerHook'),
]
)
# https://nvidia.github.io/apex/amp.html
apex = dict(
# whether to use apex.synced_bn
synced_bn=True,
# whether to use apex for mixed precision training
use_mixed_precision=False,
# the model weight type: float16 or float32
type="float16",
# the factor when apex scales the loss value
loss_scale=16,
)
total_epochs = 600
# every n epoch evaluate
validate_interval = 25
gpus = 2
dist_params = dict(backend='nccl')
log_level = 'INFO'
validate = True
load_from = None
#osp.join(root, 'exps/AcfNet/scene_flow_uniform/epoch_19.pth')
resume_from = None
workflow = [('train', 1)]
work_dir = osp.join(root, 'exps/AcfNet/kitti_2015_uniform')
# For test
checkpoint = osp.join(work_dir, 'epoch_600.pth')
out_dir = osp.join(work_dir, 'epoch_600')
Looking forward for your reply, thank you very much!
Hi, @ASONG0506 How do you resume the pretrained checkpoint from scene_flow, as you had commented 'resume_from' in the config file?
I trained it from scratch without any pretrained model.
First of all, the sky area is easy to over-fitting, and all of the state-of-the-art cannot prevent it. Maybe you can try with Local Soft argmin for inference even if you trained your model with FasterSoftArgmin during training. In my experiment, it often predicts with smaller 3-pixel-error and performs better on sky area, as it only depends on several disparity indexes around the peak, rather than all indexes of soft argmin.
disp_predictor=dict(
# LocalSoftArgmin
type="LOCAL",
# the maximum disparity of disparity search range
max_disp=max_disp,
# the radius of window when local sampling
radius=3,
# the start disparity of disparity search range
start_disp=0,
# the step between near disparity sample
dilation=1,
# the step between near disparity index when local sampling
radius_dilation=1,
# the temperature coefficient of soft argmin
alpha=1.0,
# whether normalize the estimated cost volume
normalize=True,
),
Secondly, except for the sky area, the predicted disparity map is pretty well as for my experience.
Thirdly, I think you'd better use the pre-trained model from scene_flow, by the way, you can download the pre-trained model of mine from here. As only using 200 image pairs in KITTI is not enough for stereo matching and leads to overfitting more easily.
Get it, thank you very much!
Great, it works!
I trained a model using this code and Kitti dataset, without any parameter modified.
One of the test result is shown as fig 1.
while the corresponding result on the kitti official website is shown as fig 2.
some parts of the sky area is wrong, could you please tell me what's wrong with my training process ? Thanks!