PingoLH / FCHarDNet

Fully Convolutional HarDNet for Segmentation in Pytorch
MIT License
195 stars 52 forks source link

about IoU with different size from cityscape,please help me! #38

Open electronicYH opened 4 years ago

electronicYH commented 4 years ago

Because of the camera output size is 640*360, so I changed the size of cityscapes dataset. Then I use the project code to train, but I can't get the good IoU about 75%, please help me , how can get the good IoU result. the train log is: INFO:ptsemseg:Iter [90000/90000] Loss: 0.9804 Time/Image: 0.0165 lr=0.090953 11it [00:05, 2.08it/s] INFO:ptsemseg:Iter 90000 Val Loss: 1.1422 Overall Acc: 0.902127139034985 INFO:ptsemseg:Overall Acc: : 0.902127139034985 Mean Acc : 0.6061411175875274 INFO:ptsemseg:Mean Acc : : 0.6061411175875274 FreqW Acc : 0.8350029456995808 INFO:ptsemseg:FreqW Acc : : 0.8350029456995808 Mean IoU : 0.48783643289422235 INFO:ptsemseg:Mean IoU : : 0.48783643289422235 INFO:ptsemseg:0: 0.949375256023792 INFO:ptsemseg:1: 0.6434965931893878 INFO:ptsemseg:2: 0.8355899486279862 INFO:ptsemseg:3: 0.36742551411680824 INFO:ptsemseg:4: 0.24889070738206942 INFO:ptsemseg:5: 0.31642120260993717 INFO:ptsemseg:6: 0.28094440896774303 INFO:ptsemseg:7: 0.42495380920802917 INFO:ptsemseg:8: 0.8470327817169933 INFO:ptsemseg:9: 0.4534516073428247 INFO:ptsemseg:10: 0.8743228190634068 INFO:ptsemseg:11: 0.4587472314322872 INFO:ptsemseg:12: 0.310287655352762 INFO:ptsemseg:13: 0.8298468752303512 INFO:ptsemseg:14: 0.42321140100466925 INFO:ptsemseg:15: 0.39785631228298224 INFO:ptsemseg:16: 0.031675380114154154 INFO:ptsemseg:17: 0.09916433094570747 INFO:ptsemseg:18: 0.4761983903783339

and the hardnet.yml is:
model:
    arch: hardnet
data:
    dataset: cityscapes
    train_split: train
    val_split: val
    img_rows: 360
    img_cols: 640
    path: ../cityscape_transformation/
    sbd_path: ../cityscape_transformation/
training:
    train_iters: 90000
    batch_size: 48
    val_interval: 500
    n_workers: 8
    print_interval: 10
    augmentations:
        hflip: 0.5
        rscale_crop: [360, 360]
    optimizer:
        name: 'sgd'
        lr: 0.1
        weight_decay: 0.0005
        momentum: 0.9
    loss:
        name: 'bootstrapped_cross_entropy'
        min_K: 4096
        loss_th: 0.3
        size_average: True
    lr_schedule: 
        name: 'poly_lr'
        max_iter: 9000000
    resume: None
    finetune: None    

and I only modify the code train.py here: v_loader = data_loader( data_path, is_transform=True, split=cfg["data"]["val_split"], img_size=(360,640), ) and the code cityscapes_loader.py here: def init( self, root, split="train", is_transform=False, img_size=(360, 640), augmentations=None, img_norm=True, version="cityscapes", test_mode=False, ): """init

PingoLH commented 4 years ago

Hi, honestly, you can't get a good IoU for such a small input resolution, since this network architecture was designed for 2048x1024 input. You can notice that the first four conv layers have two layers with stride=2, which means the spacial information was quickly shrunk into 512x256, while in your case, it will be 160x90 which is way too small for segmentation. You can try to simply remove the "stride=2" for both or one of the mentioned two conv layers (line #270, #272). The pretrained weight can still be loaded, but the network inference speed will be much slower.

electronicYH commented 4 years ago

Thank you for your help, I will try it. Thank you!

MrCrazyCrab commented 4 years ago

@electronicYH have you solve the problem to small input image size?