DeepMotionAIResearch / DenseMatchingBenchmark

Dense Matching Benchmark
MIT License
176 stars 31 forks source link

what's the problem with my result? #11

Closed ASONG0506 closed 4 years ago

ASONG0506 commented 4 years ago

I trained a model using this code and Kitti dataset, without any parameter modified.

One of the test result is shown as fig 1. fig1

while the corresponding result on the kitti official website is shown as fig 2. fig2

some parts of the sky area is wrong, could you please tell me what's wrong with my training process ? Thanks!

youmi-zym commented 4 years ago

Hi @ASONG0506 , The sky area is easy to over-fitting, but I only saw rarely several times. As for the other part, I think there must be something wrong. Could you please provide the wholistic training process and training environment you had set? e.g., config file, epoch, GPUS and so on. More details help a lot

ASONG0506 commented 4 years ago

Thanks for your reply, I will re-train the model with all the parameters the same as yours.

ASONG0506 commented 4 years ago

Screenshot from 2020-04-14 09-55-00 Hi, as is shown in this figure, there are still some bad-predicted areas, especially for the sky area. My training environment is as below:

apex                      0.1                   
cffi                      1.14.0                
cupy                      7.3.0                 
Cython                    0.29.16               
decorator                 4.4.2                 
dmb                       1.0                   DenseMatchingBenchmark                 
easydict                  1.9                   
GANet                     0.0.0                DenseMatchingBenchmark/dmb/ops/libGANet
gaterecurrent2dnoind-cuda 0.0.0                DenseMatchingBenchmark/dmb/ops/spn     
ipython                   7.9.0                 
matplotlib                3.0.3                 
mmcv                      0.4.3                 
numpy                     1.18.2                
opencv-python             4.2.0.34              
pandas                    0.24.2                
Pillow                    6.2.2                 
pip                       20.0.2                
PyWavelets                1.1.1                 
PyYAML                    5.3.1                 
scikit-image              0.15.0                
scipy                     1.4.1                 
setuptools                46.1.3                
six                       1.14.0                
tensorboard               1.14.0                
tensorboardX              2.0                   
tensorflow-estimator      1.14.0                
tensorflow-gpu            1.14.0                
termcolor                 1.1.0                 
thop                      0.0.31.post2004070130 
torch                     1.1.0                 
torchvision               0.2.1                 
tqdm                      4.45.0                

The dataset I used was kitti-stereo-2015, and the model was trained from scratch using distributed-training with 2 TITAN X GPUs. I used the kitti_uniform.py setting file with some modification as below,

import os.path as osp

# model settings
max_disp = 192
model = dict(
    meta_architecture="GeneralizedStereoModel",
    max_disp=max_disp,  # max disparity
    batch_norm=True,  # the model whether or not to use BatchNorm
    backbone=dict(
        type="PSMNet",
        in_planes=3,  # the in planes of feature extraction backbone
    ),
    cost_processor=dict(
        # Use the concatenation of left and right feature to form cost volume, then aggregation
        type='CAT',
        cost_computation=dict(
            # default cat_fms
            type="default",
            # the maximum disparity of disparity search range under the resolution of feature
            max_disp=int(max_disp // 4),
            # the start disparity of disparity search range
            start_disp=0,
            # the step between near disparity sample
            dilation=1,
        ),
        cost_aggregator=dict(
            type="ACF",
            # the maximum disparity of disparity search range
            max_disp=max_disp,
            # the in planes of cost aggregation sub network
            in_planes=64,
        ),
    ),
    disp_predictor=dict(
        # default FasterSoftArgmin
        type="FASTER",
        # the maximum disparity of disparity search range
        max_disp=max_disp,
        # the start disparity of disparity search range
        start_disp=0,
        # the step between near disparity sample
        dilation=1,
        # the temperature coefficient of soft argmin
        alpha=1.0,
        # whether normalize the estimated cost volume
        normalize=True,
    ),
    losses=dict(
        focal_loss=dict(
            # the maximum disparity of disparity search range
            max_disp=max_disp,
            # the start disparity of disparity search range
            start_disp=0,
            # the step between near disparity sample
            dilation=1,
            # weight for stereo focal loss with regard to other loss type
            weight=1.0,
            # weights for different scale loss
            weights=(1.0, 0.7, 0.5),
            # stereo focal loss focal coefficient
            coefficient=5.0,
            # the variance of uni-modal distribution
            variance=1.2, # if not given, the variance will be estimated by network
        ),
        l1_loss=dict(
            # the maximum disparity of disparity search range
            max_disp=max_disp,
            # weight for l1_loss with regard to other loss type
            weight=0.1,
            # weights for different scale loss
            weights=(1.0, 0.7, 0.5),
        ),
    ),
    eval=dict(
        # evaluate the disparity map within (lower_bound, upper_bound)
        lower_bound=0,
        upper_bound=max_disp,
        # evaluate the disparity map in occlusion area and not occlusion
        eval_occlusion=True,
        # return the cost volume after regularization for visualization
        is_cost_return=False,
        # whether move the cost volume from cuda to cpu
        is_cost_to_cpu=True,
    ),
)

# dataset settings
dataset_type = 'KITTI-2015'
# data_root = 'datasets/{}/'.format(dataset_type)
# annfile_root = osp.join(data_root, 'annotations')

# root = '/home/youmin/'
#root = '/node01/jobs/io/out/youmin/'
root= "/home/xxx/env_acfnet/DenseMatchingBenchmark"

data_root = osp.join(root, 'data/StereoMatching/', dataset_type)
annfile_root = osp.join(root, 'data/annotations/', dataset_type)

# If you don't want to visualize the results, just uncomment the vis data
# For download and usage in debug, please refer to DATA.md and GETTING_STATED.md respectively.
vis_data_root = osp.join(root, 'data/visualization_data/', dataset_type)
vis_annfile_root = osp.join(vis_data_root, 'annotations')

data = dict(
    # if disparity of datasets is sparse, e.g., SceneFLow is not sparse, but KITTI is sparse
    sparse=True,
    imgs_per_gpu=2,
    workers_per_gpu=16,
    train=dict(
        type=dataset_type,
        data_root=data_root,
        annfile=osp.join(annfile_root, 'full_train.json'),
        input_shape=[256, 512],
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
        use_right_disp=False,
    ),
    eval=dict(
        type=dataset_type,
        data_root=data_root,
        annfile=osp.join(annfile_root, 'full_eval.json'),
        input_shape=[384, 1248],
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
        use_right_disp=False,
    ),
    # If you don't want to visualize the results, just uncomment the vis data
    vis=dict(
        type=dataset_type,
        data_root=vis_data_root,
        annfile=osp.join(vis_annfile_root, 'vis_test.json'),
        input_shape=[384, 1248],
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
    ),
    test=dict(
        type=dataset_type,
        data_root=data_root,
        annfile=osp.join(annfile_root, 'full_test.json'),
        input_shape=[384, 1248],
        mean=[0.485, 0.456, 0.406],
        std=[0.229, 0.224, 0.225],
        use_right_disp=False,
    ),
)

optimizer = dict(type='RMSprop', lr=0.001)
optimizer_config = dict(grad_clip=dict(max_norm=35, norm_type=2))

lr_config = dict(
    policy='step',
    warmup='constant',
    warmup_iters=100,
    warmup_ratio=1.0,
    gamma=1/3,
    step=[100, 300, 600]
)
checkpoint_config = dict(
    interval=25
)

log_config = dict(
    interval=5,
    hooks=[
        dict(type='TextLoggerHook'),
        dict(type='TensorboardLoggerHook'),
    ]
)

# https://nvidia.github.io/apex/amp.html
apex = dict(
    # whether to use apex.synced_bn
    synced_bn=True,
    # whether to use apex for mixed precision training
    use_mixed_precision=False,
    # the model weight type: float16 or float32
    type="float16",
    # the factor when apex scales the loss value
    loss_scale=16,
)

total_epochs = 600
# every n epoch evaluate
validate_interval = 25

gpus = 2
dist_params = dict(backend='nccl')
log_level = 'INFO'
validate = True
load_from = None
#osp.join(root, 'exps/AcfNet/scene_flow_uniform/epoch_19.pth')
resume_from = None

workflow = [('train', 1)]
work_dir = osp.join(root, 'exps/AcfNet/kitti_2015_uniform')

# For test
checkpoint = osp.join(work_dir, 'epoch_600.pth')
out_dir = osp.join(work_dir, 'epoch_600')

Looking forward for your reply, thank you very much!

youmi-zym commented 4 years ago

Hi, @ASONG0506 How do you resume the pretrained checkpoint from scene_flow, as you had commented 'resume_from' in the config file?

ASONG0506 commented 4 years ago

I trained it from scratch without any pretrained model.

youmi-zym commented 4 years ago

First of all, the sky area is easy to over-fitting, and all of the state-of-the-art cannot prevent it. Maybe you can try with Local Soft argmin for inference even if you trained your model with FasterSoftArgmin during training. In my experiment, it often predicts with smaller 3-pixel-error and performs better on sky area, as it only depends on several disparity indexes around the peak, rather than all indexes of soft argmin.

disp_predictor=dict(
        # LocalSoftArgmin
        type="LOCAL",
        # the maximum disparity of disparity search range
        max_disp=max_disp,
        # the radius of window when local sampling
        radius=3,
        # the start disparity of disparity search range
        start_disp=0,
        # the step between near disparity sample
        dilation=1,
        # the step between near disparity index when local sampling
        radius_dilation=1,
        # the temperature coefficient of soft argmin
        alpha=1.0,
        # whether normalize the estimated cost volume
        normalize=True,
    ),

Secondly, except for the sky area, the predicted disparity map is pretty well as for my experience.

Thirdly, I think you'd better use the pre-trained model from scene_flow, by the way, you can download the pre-trained model of mine from here. As only using 200 image pairs in KITTI is not enough for stereo matching and leads to overfitting more easily.

ASONG0506 commented 4 years ago

Get it, thank you very much!

ASONG0506 commented 4 years ago

Screenshot from 2020-04-14 17-35-41

Great, it works!