Closed Diuyon closed 3 years ago
你好,看起来是使用py_reader过程出现的问题。可否提供一下复现环境,比如训练配置、PaddleSeg、Paddle版本、几张样例图片
好的
yaml文件内容如下
BATCH_SIZE : 2
TRAIN_CROP_SIZE : (512, 512)
EVAL_CROP_SIZE : (1000, 1000)
# 数据集配置
DATASET:
DATA_DIR : "../../data/CelebAMask/"
TRAIN_FILE_LIST : "../../data/CelebAMask/train.txt"
VAL_FILE_LIST: "../../data/CelebAMask/validation.txt"
TEST_FILE_LIST: "../../data/CelebAMask/test.txt"
VIS_FILE_LIST: "../../data/CelebAMask/validation.txt"
NUM_CLASSES: 4
# 模型配置
MODEL:
MODEL_NAME: "deeplabv3p"
DEFAULT_NORM_TYPE: "bn"
DEEPLAB:
BACKBONE: "xception_65"
# 数据增强
AUG:
AUG_METHOD: "stepscaling"
FIX_RESIZE_SIZE: (512, 512)
TRAIN:
PRETRAINED_MODEL_DIR: "./pretrained_model/deeplabv3p_xception65_bn_coco"
MODEL_SAVE_DIR: "./saved_model/deeplabv3p_xception65_headseg/"
SNAPSHOT_EPOCH: 10
TEST:
TEST_MODEL: ""
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
# 设置优化参数
SOLVER:
NUM_EPOCHS: 50
LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "adam"
示例图片如下
images(origin)
mask(label),类别共有4个,分别为:头发(2)、面部(1)、耳环(3)、以及出前3者外的区域(0)
二值化展示图
paddle版本信息
paddlehub 1.5.4
paddlepaddle-gpu 1.7.1.post107
paddleSeg为最新拉取版本
你用的是PaddleHub进行训练的? label中3个类别的像素值是0,1,2吗?背景也就是二值图中黑色区域是多少呢,标为255吗?
哦哦,我知道了,应该是这个问题,尴尬 ̄□ ̄||
好的,我们的标注协议是从0开始,0,1,2递增。默认ignore的类别是255
对了,训练之前最好使用pdseg/check.py检查一下数据和配置,就可以及早发现这些问题了
我是有检查过的,检查通过了,我刚刚修改了NUM_CLASSES: 4
,但是仍然出现了这个问题
你的label像素标的是0,1,2,3 还是其他呢?
在mask当中,我对值得设置是按照0-255的规则
我刚刚检查了一下,mask当中存在标签为4的情况,我现在去修改
我修改了mask,还是出现了这个问题,我尝试将mask改成2分类(前后景),问题就没了,这是为什么呢?
上面的图片信息,我修改成了只存在0,1,2,3 这4种类别的版本
应该还是mask标注问题。你是怎么修改成2分类的?
原先:1: 脸部; 2: 头发; 3: 耳饰; 0: 除前3者外的区域 二分类: 1: 脸部、头发、耳饰;0:除前者意外的区域
确实有点奇怪。原先的数据跑一下check.py,将输出结果detail.log发出来吧
detail.log 日志
PASS ../../data/CelebAMask/test.txt DATASET.SEPARATOR check
PASS ../../data/CelebAMask/test.txt DATASET.SEPARATOR check
2020-03-23 16:36:03,469-INFO:
PASS dataset reading check
PASS dataset reading check
2020-03-23 16:36:03,469-INFO: All images can be read successfully
All images can be read successfully
2020-03-23 16:36:03,471-INFO:
PASS label gray check
PASS label gray check
2020-03-23 16:36:03,471-INFO: All label images are gray
All label images are gray
2020-03-23 16:36:03,471-INFO:
PASS label format check
PASS label format check
2020-03-23 16:36:03,472-INFO: total 6000 label images are png format, 0 label images are not png format
total 6000 label images are png format, 0 label images are not png format
2020-03-23 16:36:03,472-INFO:
Doing label pixel statistics:
(label class, total pixel number, percentage) = [(0, 576041840, 0.3662), (1, 523887858, 0.3331), (2, 468858689, 0.2981), (3, 4075613, 0.0026)]
Doing label pixel statistics:
(label class, total pixel number, percentage) = [(0, 576041840, 0.3662), (1, 523887858, 0.3331), (2, 468858689, 0.2981), (3, 4075613, 0.0026)]
2020-03-23 16:36:03,473-INFO:
PASS label class check!
PASS label class check!
2020-03-23 16:36:03,488-INFO:
PASS DATASET.IMAGE_TYPE check
PASS DATASET.IMAGE_TYPE check
2020-03-23 16:36:03,488-INFO:
Doing max image size statistics:
Doing max image size statistics:
2020-03-23 16:36:03,488-INFO: max width and max height of images are (512,512)
max width and max height of images are (512,512)
2020-03-23 16:36:03,489-INFO:
PASS shape check
PASS shape check
2020-03-23 16:36:03,489-INFO: All images are the same shape as the labels
All images are the same shape as the labels
2020-03-23 16:36:03,489-INFO:
PASS EVAL_CROP_SIZE check
PASS EVAL_CROP_SIZE check
2020-03-23 16:36:03,489-INFO: satisfy current EVAL_CROP_SIZE: (1000,1000) >= max width and max height of images: (512,512)
satisfy current EVAL_CROP_SIZE: (1000,1000) >= max width and max height of images: (512,512)
Detailed error information can be viewed in detail.log file.
这是test.txt,有train.txt的吗
直接把整个detail.log发出来吧
PASS ../../data/CelebAMask/test.txt DATASET.SEPARATOR check 测试集check结果没问题 为什么只check了测试集呢?是训练过程出错,应该check下训练集
训练集也是通过的,在我发的文件detail.log中有所有check记录
看到了。check记录看起来正常,要不发我一些图片我复现一下 chulutao@baidu.com
邮件已发送
NOT PASS loss check. Dice loss and bce loss is only applicable to binary classfication
我这边check的结果,dice loss和bce loss只能在2分类的时候用,多分类目前不行
可是我配置文件中并没有写明用哪个loss,按道理它会默认使用softmax吧?如果我要使得多分类任务正常运行,是不是要指明LOSS: ["softmax_loss"]
?
是的,默认用softmax,不需要指名。我用的是你上面给的yaml
那现在该怎么办呢?
提供一下你后来的yaml吧 git branch看看你当前的PaddleSeg版本
版本:release/v0.4.0
yaml:如果是多分类就仍然还是上面哪个,二分类则为
BATCH_SIZE : 16
TRAIN_CROP_SIZE : (512, 512)
EVAL_CROP_SIZE : (1000, 1000)
# 数据集配置
DATASET:
DATA_DIR : "../../data/data25995/"
TRAIN_FILE_LIST : "../../data/data25995/train.txt"
VAL_FILE_LIST: "../../data/data25995/validation.txt"
TEST_FILE_LIST: "../../data/data25995/test.txt"
VIS_FILE_LIST: "../../data/data25995/validation.txt"
NUM_CLASSES: 2
# 模型配置
MODEL:
MODEL_NAME: "deeplabv3p"
DEFAULT_NORM_TYPE: "bn"
DEEPLAB:
BACKBONE: "xception_65"
# 数据增强
AUG:
AUG_METHOD: "stepscaling"
FIX_RESIZE_SIZE: (512, 512)
TRAIN:
PRETRAINED_MODEL_DIR: "./saved_model/deeplabv3p_xception65_headseg/40"
MODEL_SAVE_DIR: "./saved_model/deeplabv3p_xception65_headseg_2/"
SNAPSHOT_EPOCH: 10
TEST:
TEST_MODEL: "./saved_model/deeplabv3p_xception65_headseg_2/final/"
FREEZE:
MODEL_FILENAME: "model"
PARAMS_FILENAME: "params"
# 设置优化参数
SOLVER:
NUM_EPOCHS: 20
LR: 0.001
LR_POLICY: "poly"
OPTIMIZER: "adam"
1. 问题报告如下:
2. 详细错误日志如下:
3. 错误描述如下:
如果您能帮忙解决该问题,我将万分感谢!