How can I properly change the input image size on CondLane?

parkjbdev commented 2 years ago

Currently I'm detecting lanes using tools/detect.py.

For Condlane inference, I changed this

batch_size=1 # from 8 (for condlane inference)

And tried these configs for FHD input image

img_height = 1080 # from 320
img_width = 1920 # from 800

ori_img_h = 1080 # from 590
ori_img_w = 1920 # from 1640

crop_bbox = [0,540,1920,1080] # from [0, 270, 1640, 590]

Changing img_scale = (800,320) results

The size of tensor a must match the size of tensor b at non-singleton dimension 3

How can I properly change the input image size (ex. FHD) on CondLane config file?

Turoad commented 2 years ago

Keep this setting is

img_height = 320
img_width = 800

and img_scale = (img_width, img_height) We will update this in config later.

parkjbdev commented 2 years ago

Actually, it detects no lanes with that configuration.

Though, I tried this

img_height = 1080
img_width = 800
ori_img_h = 1080
ori_img_w = 1920
img_scale = (800, 320)
crop_bbox = [0, 540, 1920, 1080]

and the result image is

which is on the wrong position. (Don't remind the fps)

Thanks for the support

Turoad commented 2 years ago

Have you tried this?

img_height = 320
img_width = 800
ori_img_h = 1080
ori_img_w = 1920
img_scale = (800, 320)
crop_bbox = [0, 540, 1920, 1080]

parkjbdev commented 2 years ago

Have you tried this?

img_height = 320
img_width = 800
ori_img_h = 1080
ori_img_w = 1920
img_scale = (800, 320)
crop_bbox = [0, 540, 1920, 1080]

Yes I did, only to see nothing detected at all.

Turoad commented 2 years ago

img_scale should be img_scale = (img_width, img_height), what about this:

img_height = 1080
img_width = 800
ori_img_h = 1080
ori_img_w = 1920
img_scale = (img_width, img_height)
crop_bbox = [0, 540, 1920, 1080]

Have you tried more images?

parkjbdev commented 2 years ago

img_scale should be img_scale = (img_width, img_height), what about this:
img_height = 1080
img_width = 800
ori_img_h = 1080
ori_img_w = 1920
img_scale = (img_width, img_height)
crop_bbox = [0, 540, 1920, 1080]
Have you tried more images?

Yes I tried more images. (Since I tried using a video with a little bit of modifications) Changing img_scale = (img_width, img_height) leads to this Error

Traceback (most recent call last):
  File "tools/detect.py", line 86, in <module>
    process(args)
  File "tools/detect.py", line 75, in process
    detect.run(p)
  File "tools/detect.py", line 50, in run
    data['lanes'] = self.inference(data)[0]
  File "tools/detect.py", line 37, in inference
    data = self.net(data)
  File "/home/pc-6/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pc-6/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/pc-6/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pc-6/repos/lanedet/lanedet/models/nets/detector.py", line 26, in forward
    fea[-1] = self.aggregator(fea[-1])
  File "/home/pc-6/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pc-6/repos/lanedet/lanedet/models/aggregators/transformer.py", line 152, in forward
    src = layer(src, pos.to(src.device))
  File "/home/pc-6/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 1110, in _call_impl
    return forward_call(*input, **kwargs)
  File "/home/pc-6/repos/lanedet/lanedet/models/aggregators/transformer.py", line 106, in forward
    x += pos
RuntimeError: The size of tensor a (34) must match the size of tensor b (10) at non-singleton dimension 2

Turoad commented 2 years ago

pos_shape should also be changed, it's (batch_size, img_height/32, img_width/32).

aggregator = dict(
    type='TransConvEncoderModule',
    in_dim=2048,
    attn_in_dims=[2048, 256],
    attn_out_dims=[256, 256],
    strides=[1, 1],
    ratios=[4, 4],
    pos_shape=(batch_size, 34, 25),
)

parkjbdev commented 2 years ago

pos_shape should also be changed, it's (batch_size, img_height/32, img_width/32).

aggregator = dict(
    type='TransConvEncoderModule',
    in_dim=2048,
    attn_in_dims=[2048, 256],
    attn_out_dims=[256, 256],
    strides=[1, 1],
    ratios=[4, 4],
    pos_shape=(batch_size, 34, 25),
)

changing only pos_shape results this error

  File "torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "torch/nn/parallel/data_parallel.py", line 166, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "lanedet/lanedet/models/nets/detector.py", line 35, in forward
    output = self.heads(fea)
  File "torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "lanedet/lanedet/models/heads/condlane.py", line 1013, in forward
    return self.forward_test(x_list, )
  File "lanedet/lanedet/models/heads/condlane.py", line 984, in forward_test
    masks = self.mask_head(mask_branch, mask_params, num_ins, idx)
  File "torch/nn/modules/module.py", line 1102, in _call_impl
    return forward_call(*input, **kwargs)
  File "lanedet/lanedet/models/heads/condlane.py", line 534, in forward
    x = torch.cat([locations, x], dim=1)
RuntimeError: Sizes of tensors must match except in dimension 1. Expected size 80 but got size 270 for tensor number 1 in the list.

Thought maybe mask_size and heads.location_configs should be changed from 270 to 80 which leads to no error but also detecting no lanes

Turoad commented 2 years ago

Yes, the mask_size should also be consistent. I also try it, the difference is:

sample_y = range(590, 270, -8)

-batch_size = 8
+batch_size = 1
 aggregator = dict(
     type='TransConvEncoderModule',
     in_dim=2048,
@@ -21,7 +21,7 @@ aggregator = dict(
     attn_out_dims=[256, 256],
     strides=[1, 1],
     ratios=[4, 4],
-    pos_shape=(batch_size, 10, 25),
+    pos_shape=(batch_size, 34, 25),
 )

 neck=dict(
@@ -79,7 +79,7 @@ img_norm = dict(
     std=[50.5, 53.8, 54.3]
 )

-img_height = 320
+img_height = 1080
 img_width = 800
 cut_height = 0
 ori_img_h = 590
@@ -91,9 +91,9 @@ num_lane_classes = 1
 line_width = 3
 radius = 6
 nms_thr = 4
-img_scale = (800, 320)
+img_scale = (800, 1080)
 crop_bbox = [0, 270, 1640, 590]
-mask_size = (1, 80, 200)
+mask_size = (1, 270, 200)

The sample_y should also be changed. Maybe range(1080, 540, -8) ? It will be used in visualization.

Actually, this model is trained in CULane, so it might be ok without detected lanes for new images. The best way is to train the model with your own dataset. If not, you might try to lower the hm_thr (https://github.com/Turoad/lanedet/blob/main/lanedet/models/heads/condlane.py#L709) to keep higher recall.

parkjbdev commented 2 years ago

Changing sample_y worked. Thanks a lot.

luan1412167 commented 2 years ago

I wonder that how to compute right mask_size and sample_y?

luan1412167 commented 2 years ago

I recently changed img_scale but got the error RuntimeError: The size of tensor a (20) must match the size of tensor b (25) at non-singleton dimension 3

ajay1606 commented 2 years ago

@parkjbdev Could you please share the pre-trained model file for Condlane ?

parkjbdev commented 2 years ago

I haven't trained it on my own. Try this

https://github.com/Turoad/lanedet/releases/tag/1.0

ajay1606 commented 2 years ago

@parkjbdev Thanks so much

ajay1606 commented 2 years ago

@parkjbdev I have tried with modelcondlane_r101_culane.pth with config file resnet101_culane.py as below command

(lanedet) ajay:~/lanedet$ python tools/detect0.py configs/condlane/resnet101_culane.py --img /home/ajay/lanedet/data/CULane/video_example/05081544_0305/ --load_from /home/ajay/lanedet/models/condalane/condlane_r101_culane.pth --savedir ./vis

But I am getting an error message like this!

  0%|                                                                                                                                                                                        | 0/7 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "tools/detect0.py", line 86, in <module>(lanedet) ajay@dgist:~/lanedet$ python tools/detect0.py configs/condlane/resnet101_culane.py --img /home/ajay/lanedet/data/CULane/video_example/05081544_0305/   --load_from /home/ajay/lanedet/models/condalane/condlane_r101_culane.pth --savedir ./vis
  0%|                                                                                                                                                                                        | 0/7 [00:00<?, ?it/s]
Traceback (most recent call last):
  File "tools/detect0.py", line 86, in <module>
    process(args)
  File "tools/detect0.py", line 75, in process
    detect.run(p)
  File "tools/detect0.py", line 50, in run
    data['lanes'] = self.inference(data)[0]
  File "tools/detect0.py", line 37, in inference
    data = self.net(data)
  File "/home/ajay/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ajay/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/parallel/data_parallel.py", line 165, in forward
    return self.module(*inputs[0], **kwargs[0])
  File "/home/ajay/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ajay/lanedet/lanedet/models/nets/detector.py", line 26, in forward
    fea[-1] = self.aggregator(fea[-1])
  File "/home/ajay/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ajay/lanedet/lanedet/models/aggregators/transformer.py", line 152, in forward
    src = layer(src, pos.to(src.device))
  File "/home/ajay/miniconda3/envs/lanedet/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/home/ajay/lanedet/lanedet/models/aggregators/transformer.py", line 106, in forward
    x += pos
RuntimeError: output with shape [1, 256, 10, 25] doesn't match the broadcast shape [8, 256, 10, 25]

Also in the link source code doesn't contains any config file for condlane ! How did you run example for condlane ?

parkjbdev commented 2 years ago

Refer this issue #22 You should modify the config file for condlane. The config file is at configs/condlane/resnet101_culane.py and try changing batch_size=8 to batch_size=1.

Good luck!

ajay1606 commented 2 years ago

@parkjbdev Thanks for your response and it works now. But would you please help me tuning parameters for custom dataset ! Currently, my image is (width =1920, height=1208).

Referencing above discussions i have modified config file like below :


sample_y = range(1208, 604, -8)

batch_size = 1
aggregator = dict(
    type='TransConvEncoderModule',
    in_dim=2048,
    attn_in_dims=[2048, 256],
    attn_out_dims=[256, 256],
    strides=[1, 1],
    ratios=[4, 4],
    pos_shape=(batch_size, 38, 25), // (img_height/32, img_width/32)
)

img_height = 1208 
img_width = 800
cut_height = 0 
ori_img_h = 1208
ori_img_w = 1920

mask_down_scale = 4
hm_down_scale = 16
num_lane_classes = 1
line_width = 3
radius = 6
nms_thr = 4
img_scale = (800, 1208)
crop_bbox = [0, 604, 1920, 1208]
mask_size = (1, 270, 200)

But getting errors like this RuntimeError: Sizes of tensors must match except in dimension 2. Got 302 and 80 (The offending index is 0)

Would you please suggest to me if I missing any params here!! Greatly appreciate your response. Any chance that i can look at your config file once ? Would be very much helpful

Regards, Ajay

parkjbdev commented 2 years ago

Sorry for the late response. I had to turn on the computer in my college. This is my config file and the result

net = dict(
    type='Detector',
)

backbone = dict(
    type='ResNetWrapper',
    resnet='resnet101',
    pretrained=True,
    replace_stride_with_dilation=[False, False, False],
    out_conv=False,
    in_channels=[64, 128, 256, 512]
)

ori_img_h = 1080
ori_img_w = 1920
bbox_h_start = 540
crop_bbox = [0, bbox_h_start, ori_img_w, ori_img_h]
sample_y = range(ori_img_h, bbox_h_start, -8)

batch_size = 1

aggregator = dict(
    type='TransConvEncoderModule',
    in_dim=2048,
    attn_in_dims=[2048, 256],
    attn_out_dims=[256, 256],
    strides=[1, 1],
    ratios=[4, 4],
    pos_shape=(batch_size, 10, 25),
)

neck=dict(
    type='FPN',
    in_channels=[256, 512, 1024, 256],
    out_channels=64,
    num_outs=4,
    #trans_idx=-1,
)

loss_weights=dict(
        hm_weight=1,
        kps_weight=0.4,
        row_weight=1.,
        range_weight=1.,
    )

num_lane_classes=1
heads=dict(
    type='CondLaneHead',
    heads=dict(hm=num_lane_classes),
    in_channels=(64, ),
    num_classes=num_lane_classes,
    head_channels=64,
    head_layers=1,
    disable_coords=False,
    branch_in_channels=64,
    branch_channels=64,
    branch_out_channels=64,
    reg_branch_channels=64,
    branch_num_conv=1,
    hm_idx=2,
    mask_idx=0,
    compute_locations_pre=True,
    location_configs=dict(size=(batch_size, 1, 80, 200), device='cuda:0')
)

optimizer = dict(type='AdamW', lr=3e-4, betas=(0.9, 0.999), eps=1e-8)

epochs = 16
total_iter = (88880 // batch_size) * epochs
import math
scheduler = dict(
    type = 'MultiStepLR',
    milestones=[8, 14],
    gamma=0.1
)

seg_loss_weight = 1.0
eval_ep = 1
save_ep = 1

img_norm = dict(
    mean=[75.3, 76.6, 77.6],
    std=[50.5, 53.8, 54.3]
)

img_height = 320
img_width = 800
cut_height = 0

mask_down_scale = 4
hm_down_scale = 16
num_lane_classes = 1
line_width = 3
radius = 6
nms_thr = 4
img_scale = (800, 320)
mask_size = (1, 80, 200)

train_process = [
    dict(type='Alaug',
    transforms=[dict(type='Compose', params=dict(bboxes=False, keypoints=True, masks=False)),
    dict(
        type='Crop',
        x_min=crop_bbox[0],
        x_max=crop_bbox[2],
        y_min=crop_bbox[1],
        y_max=crop_bbox[3],
        p=1),
    dict(type='Resize', height=img_scale[1], width=img_scale[0], p=1),
    dict(
        type='OneOf',
        transforms=[
            dict(
                type='RGBShift',
                r_shift_limit=10,
                g_shift_limit=10,
                b_shift_limit=10,
                p=1.0),
            dict(
                type='HueSaturationValue',
                hue_shift_limit=(-10, 10),
                sat_shift_limit=(-15, 15),
                val_shift_limit=(-10, 10),
                p=1.0),
        ],
        p=0.7),
    dict(type='JpegCompression', quality_lower=85, quality_upper=95, p=0.2),
    dict(
        type='OneOf',
        transforms=[
            dict(type='Blur', blur_limit=3, p=1.0),
            dict(type='MedianBlur', blur_limit=3, p=1.0)
        ],
        p=0.2),
    dict(type='RandomBrightness', limit=0.2, p=0.6),
    dict(
        type='ShiftScaleRotate',
        shift_limit=0.1,
        scale_limit=(-0.2, 0.2),
        rotate_limit=10,
        border_mode=0,
        p=0.6),
    dict(
        type='RandomResizedCrop',
        height=img_scale[1],
        width=img_scale[0],
        scale=(0.8, 1.2),
        ratio=(1.7, 2.7),
        p=0.6),
    dict(type='Resize', height=img_scale[1], width=img_scale[0], p=1),]

    ),
    dict(type='CollectLane',
        down_scale=mask_down_scale,
        hm_down_scale=hm_down_scale,
        max_mask_sample=5,
        line_width=line_width,
        radius=radius,
        keys=['img', 'gt_hm'],
        meta_keys=[
            'gt_masks', 'mask_shape', 'hm_shape',
            'down_scale', 'hm_down_scale', 'gt_points'
        ]
    ),
    #dict(type='Resize', size=(img_width, img_height)),
    dict(type='Normalize', img_norm=img_norm),
    dict(type='ToTensor', keys=['img', 'gt_hm'], collect_keys=['img_metas']),
]

val_process = [
    dict(type='Alaug',
        transforms=[dict(type='Compose', params=dict(bboxes=False, keypoints=True, masks=False)),
            dict(type='Crop',
            x_min=crop_bbox[0],
            x_max=crop_bbox[2],
            y_min=crop_bbox[1],
            y_max=crop_bbox[3],
            p=1),
        dict(type='Resize', height=img_scale[1], width=img_scale[0], p=1)]
    ),
    #dict(type='Resize', size=(img_width, img_height)),
    dict(type='Normalize', img_norm=img_norm),
    dict(type='ToTensor', keys=['img']),
]

dataset_path = './data/CULane'
dataset = dict(
    train=dict(
        type='CULane',
        data_root=dataset_path,
        split='train',
        processes=train_process,
    ),
    val=dict(
        type='CULane',
        data_root=dataset_path,
        split='test',
        processes=val_process,
    ),
    test=dict(
        type='CULane',
        data_root=dataset_path,
        split='test',
        processes=val_process,
    )
)

workers = 12
log_interval = 1000
lr_update_by_epoch=True

Hope it works

ajay1606 commented 2 years ago

@parkjbdev Thank you so much for your response, I have resized my images to (1640, 590) it works well without any issues. Anyhow I will try once with your config file keeping the original image.

Thanks for your kind response, I appreciate it.

Turoad / lanedet

How can I properly change the input image size on CondLane? #49