liricky commented 3 years ago

Thanks for releasing your code, which has provided great help to me. However, I meet so error when I am training the model with xception as its backbone. I replace the backbone in your config.py file as xception when I was training. But the the loss of the network does not change after the second epoch and the miou of val data set remains the same in the following epoches. So what I should do to the hyperparameter when I want to training the network with xception as its backbone. If you can provide much detail steps, I will be really grateful to your help.

YudeWang commented 3 years ago

@liricky Please paste config.py here for more details.

liricky commented 3 years ago

@liricky Please paste config.py here for more details.

----------------------------------------

Written by Yude Wang

----------------------------------------

import torch import argparse import os import sys import cv2 import time

config_dict = { 'EXP_NAME': 'deeplabv3+voc', 'GPUS': 2,

    'DATA_NAME': 'VOCDataset',
    'DATA_YEAR': 2012,
    'DATA_AUG': True,
    'DATA_WORKERS': 2,
    'DATA_MEAN': [0.485, 0.456, 0.406],
    'DATA_STD': [0.229, 0.224, 0.225],
    'DATA_RESCALE': 512,
    'DATA_RANDOMSCALE': [0.5, 2.0],
    'DATA_RANDOM_H': 0,
    'DATA_RANDOM_S': 0,
    'DATA_RANDOM_V': 0,
    'DATA_RANDOMCROP': 512,
    'DATA_RANDOMROTATION': 0,
    'DATA_RANDOMFLIP': 0.5,
    'DATA_PSEUDO_GT': False,

    'MODEL_NAME': 'deeplabv3plus',
    'MODEL_BACKBONE': '**xception**',
    'MODEL_BACKBONE_PRETRAIN': True,
    'MODEL_BACKBONE_DILATED': True,
    'MODEL_BACKBONE_MULTIGRID': False,
    'MODEL_BACKBONE_DEEPBASE': True,
    'MODEL_SHORTCUT_DIM': 48,
    'MODEL_OUTPUT_STRIDE': 8,
    'MODEL_ASPP_OUTDIM': 256,
    'MODEL_ASPP_HASGLOBAL': True,
    'MODEL_NUM_CLASSES': 21,
    'MODEL_FREEZEBN': False,

    'TRAIN_LR': 0.007,
    'TRAIN_MOMENTUM': 0.9,
    'TRAIN_WEIGHT_DECAY': 4e-5,
    'TRAIN_BN_MOM': 0.0003,
    'TRAIN_POWER': 0.9,
    'TRAIN_BATCHES': 16,
    'TRAIN_SHUFFLE': True,
    'TRAIN_MINEPOCH': 0,
    'TRAIN_ITERATION': 30000,
    'TRAIN_TBLOG': True,

    'TEST_MULTISCALE': [0.5, 0.75, 1.0, 1.25, 1.5, 1.75],
    'TEST_FLIP': True,
    'TEST_CRF': False,
    'TEST_BATCHES': 1,

}

config_dict['ROOT_DIR'] = os.path.abspath(os.path.join(os.path.dirname("file"),'..','..')) config_dict['MODEL_SAVE_DIR'] = os.path.join(config_dict['ROOT_DIR'],'model',config_dict['EXP_NAME']) config_dict['TRAIN_CKPT'] = None config_dict['LOG_DIR'] = os.path.join(config_dict['ROOT_DIR'],'log',config_dict['EXP_NAME']) config_dict['TEST_CKPT'] = os.path.join(config_dict['ROOT_DIR'],'model/deeplabv3+voc/deeplabv3plus_xception_VOCDataset_itr30000_all.pth')

sys.path.insert(0, os.path.join(config_dict['ROOT_DIR'], 'lib'))

This is the config.py file, I have not modified your structure yet. But I have already modified your dataloader to my local data path, and it works well on my resnet101 backbone network. So what should I do to the config.py file or what should I do to your network part to use the xception as backbone and train it well. THX

YudeWang commented 3 years ago

@liricky The config.py seems okey. Please check your custom dataloader again.

YudeWang / semantic-segmentation-codebase

The hyperparameter for xception on voc data set #7

----------------------------------------

Written by Yude Wang

----------------------------------------