Closed Captain-Jo closed 4 years ago
@caoweiying123 还有其他错误信息 或 warning提示吗? 是可以依据数据格式,在coco.py/voc.py 加载数据时,排除下。 但得依据错误信息,看下。
BBoxAssigner:
batch_size_per_im: 512
bbox_reg_weights:
- 0.1
- 0.1
- 0.2
- 0.2
bg_thresh_hi: 0.5
bg_thresh_lo: 0.0
fg_fraction: 0.25
fg_thresh: 0.5
num_classes: 81
shuffle_before_sample: true
BBoxHead:
[32mhead[0m: TwoFCHead
[32mnms[0m:
keep_top_k: 100
nms_threshold: 0.5
score_threshold: 0.05
bbox_loss:
sigma: 1.0
box_coder:
axis: 1
box_normalized: false
code_type: decode_center_size
prior_box_var:
- 0.1
- 0.1
- 0.2
- 0.2
num_classes: 81
EvalReader:
batch_size: 1
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
dataset: !VOCDataSet
anno_path: ImageSets/all.txt
dataset_dir: data/head
image_dir: ''
label_list: label_list.txt
sample_num: -1
use_default_label: false
with_background: true
drop_empty: false
drop_last: false
inputs_def:
fields:
- image
- im_info
- im_id
- im_shape
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
- !ResizeImage
interp: 1
max_size: 1333
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
shuffle: false
worker_num: 2
FPN:
freeze_norm: false
has_extra_convs: false
max_level: 6
min_level: 2
norm_type: null
num_chan: 256
spatial_scale:
- 0.03125
- 0.0625
- 0.125
- 0.25
use_c5: true
FPNRPNHead:
[32manchor_generator[0m:
aspect_ratios:
- 0.5
- 1.0
- 2.0
variance:
- 1.0
- 1.0
- 1.0
- 1.0
[32mrpn_target_assign[0m:
rpn_batch_size_per_im: 256
rpn_fg_fraction: 0.5
rpn_negative_overlap: 0.3
rpn_positive_overlap: 0.7
rpn_straddle_thresh: 0.0
[32mtest_proposal[0m:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 1000
pre_nms_top_n: 1000
[32mtrain_proposal[0m:
min_size: 0.0
nms_thresh: 0.7
post_nms_top_n: 2000
pre_nms_top_n: 2000
anchor_start_size: 32
max_level: 6
min_level: 2
num_chan: 256
num_classes: 1
FPNRoIAlign:
[32msampling_ratio[0m: 2
box_resolution: 7
canconical_level: 4
canonical_size: 224
mask_resolution: 14
max_level: 5
min_level: 2
LearningRate:
[32mschedulers[0m:
- !PiecewiseDecay
gamma: 0.1
milestones:
- 120000
- 160000
values: null
- !LinearWarmup
start_factor: 0.1
steps: 1000
base_lr: 0.01
MaskAssigner:
[32mresolution[0m: 28
num_classes: 81
MaskHead:
[32mnum_convs[0m: 4
[32mresolution[0m: 28
conv_dim: 256
dilation: 1
norm_type: null
num_classes: 81
MaskRCNN:
[32mbackbone[0m: ResNet
[32mfpn[0m: FPN
[32mroi_extractor[0m: FPNRoIAlign
[32mrpn_head[0m: FPNRPNHead
bbox_assigner: BBoxAssigner
bbox_head: BBoxHead
mask_assigner: MaskAssigner
mask_head: MaskHead
rpn_only: false
OptimizerBuilder:
optimizer:
momentum: 0.9
type: Momentum
regularizer:
factor: 0.0001
type: L2
ResNet:
[32mdcn_v2_stages[0m:
- 3
- 4
- 5
[32mdepth[0m: 101
[32mnorm_type[0m: bn
[32mvariant[0m: d
feature_maps:
- 2
- 3
- 4
- 5
freeze_at: 2
freeze_norm: true
gcb_params: {}
gcb_stages: []
lr_mult_list:
- 1.0
- 1.0
- 1.0
- 1.0
nonlocal_stages: []
norm_decay: 0.0
weight_prefix_name: ''
TestReader:
batch_size: 1
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: true
dataset: !ImageFolder
anno_path: data/head/label_list.txt
dataset_dir: ''
image_dir: ''
sample_num: -1
use_default_label: false
with_background: true
drop_last: false
inputs_def:
fields:
- image
- im_info
- im_id
- im_shape
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !NormalizeImage
is_channel_first: false
is_scale: true
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
- !ResizeImage
interp: 1
max_size: 1333
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
shuffle: false
TrainReader:
batch_size: 1
batch_transforms:
- !PadBatch
pad_to_stride: 32
use_padded_im_info: false
dataset: !VOCDataSet
anno_path: ImageSets/all.txt
dataset_dir: data/head
image_dir: ''
label_list: label_list.txt
sample_num: -1
use_default_label: false
with_background: true
drop_last: false
inputs_def:
fields:
- image
- im_info
- im_id
- gt_bbox
- gt_class
- is_crowd
- gt_mask
sample_transforms:
- !DecodeImage
to_rgb: true
with_mixup: false
- !RandomFlipImage
is_mask_flip: true
is_normalized: false
prob: 0.5
- !NormalizeImage
is_channel_first: false
is_scale: true
mean:
- 0.485
- 0.456
- 0.406
std:
- 0.229
- 0.224
- 0.225
- !ResizeImage
interp: 1
max_size: 1333
target_size: 800
use_cv2: true
- !Permute
channel_first: true
to_bgr: false
shuffle: true
use_process: false
worker_num: 2
TwoFCHead:
mlp_dim: 1024
architecture: MaskRCNN
finetune_exclude_pretrained_params:
- cls_score
- bbox_pred
- mask_fcn_logits
log_iter: 50
log_smooth_window: 20
max_iters: 20000
metric: VOC
num_classes: 2
pretrain_weights: https://paddle-imagenet-models-name.bj.bcebos.com/ResNet101_vd_pretrained.tar
save_dir: output
snapshot_iter: 2000
use_gpu: true
weights: output/head/model_final
2020-05-16 21:01:37,307-INFO: 747 samples in file data/head/ImageSets/all.txt
2020-05-16 21:01:37,307-INFO: places would be ommited when DataLoader is not iterable
W0516 21:01:38.151697 646 device_context.cc:237] Please NOTE: device: 0, CUDA Capability: 70, Driver API Version: 9.2, Runtime API Version: 9.0
W0516 21:01:38.155509 646 device_context.cc:245] device: 0, cuDNN Version: 7.3.
2020-05-16 21:01:39,532-INFO: Found /home/aistudio/.cache/paddle/weights/ResNet101_vd_pretrained
2020-05-16 21:01:39,533-INFO: Loading parameters from /home/aistudio/.cache/paddle/weights/ResNet101_vd_pretrained...
2020-05-16 21:01:39,533-WARNING: /home/aistudio/.cache/paddle/weights/ResNet101_vd_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
2020-05-16 21:01:39,533-WARNING: /home/aistudio/.cache/paddle/weights/ResNet101_vd_pretrained.pdparams not found, try to load model file saved with [ save_params, save_persistables, save_vars ]
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/io.py:1972: UserWarning: This list is not set, Because of Paramerter not found in program. There are: fc_0.b_0 fc_0.w_0
format(" ".join(unused_para_list)))
2020-05-16 21:01:40,616-INFO: 747 samples in file data/head/ImageSets/all.txt
2020-05-16 21:01:40,616-INFO: places would be ommited when DataLoader is not iterable
I0516 21:01:40.650564 646 parallel_executor.cc:440] The Program will be executed on CUDA using ParallelExecutor, 1 cards are used, so 1 programs are executed in parallel.
I0516 21:01:40.719694 646 build_strategy.cc:365] SeqOnlyAllReduceOps:0, num_trainers:1
I0516 21:01:40.977018 646 parallel_executor.cc:307] Inplace strategy is enabled, when build_strategy.enable_inplace = True
I0516 21:01:41.026599 646 parallel_executor.cc:375] Garbage collection strategy is enabled, when FLAGS_eager_delete_tensor_gb = 0
2020-05-16 21:01:41,037-WARNING: recv endsignal from outq with errmsg[consumer[consumer-2b4-0] exits for reason[producer[producer-2b4] failed with error: ]]
2020-05-16 21:01:41,037-WARNING: recv endsignal from outq with errmsg[consumer[consumer-2b4-1] exits for reason[consumer[consumer-2b4-0] exits for reason[producer[producer-2b4] failed with error: ]]]
2020-05-16 21:01:41,037-WARNING: Your reader has raised an exception!
Exception in thread Thread-7:
Traceback (most recent call last):
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 926, in _bootstrap_inner
self.run()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/threading.py", line 870, in run
self._target(*self._args, **self._kwargs)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 805, in __thread_main__
six.reraise(*sys.exc_info())
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 785, in __thread_main__
for tensors in self._tensor_reader():
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 853, in __tensor_reader_impl__
for slots in paddle_reader():
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/data_feeder.py", line 488, in __reader_creator__
for item in reader():
File "/home/aistudio/PaddleDetection/ppdet/data/reader.py", line 421, in _reader
reader.reset()
File "/home/aistudio/PaddleDetection/ppdet/data/parallel_map.py", line 259, in reset
assert not self._exit, "cannot reset for already stopped dataset"
AssertionError: cannot reset for already stopped dataset
/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py:782: UserWarning: The following exception is not an EOF exception.
"The following exception is not an EOF exception.")
Traceback (most recent call last):
File "PaddleDetection/tools/train.py", line 367, in <module>
main()
File "PaddleDetection/tools/train.py", line 246, in main
outs = exe.run(compiled_train_prog, fetch_list=train_values)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 783, in run
six.reraise(*sys.exc_info())
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/six.py", line 693, in reraise
raise value
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 778, in run
use_program_cache=use_program_cache)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 843, in _run_impl
return_numpy=return_numpy)
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/executor.py", line 677, in _run_parallel
tensors = exe.run(fetch_var_names)._move_to_list()
paddle.fluid.core_avx.EnforceNotMet:
--------------------------------------------
C++ Call Stacks (More useful to developers):
--------------------------------------------
0 std::string paddle::platform::GetTraceBackString<std::string const&>(std::string const&, char const*, int)
1 paddle::platform::EnforceNotMet::EnforceNotMet(std::string const&, char const*, int)
2 paddle::operators::reader::BlockingQueue<std::vector<paddle::framework::LoDTensor, std::allocator<paddle::framework::LoDTensor> > >::Receive(std::vector<paddle::framework::LoDTensor, std::allocator<paddle::framework::LoDTensor> >*)
3 paddle::operators::reader::PyReader::ReadNext(std::vector<paddle::framework::LoDTensor, std::allocator<paddle::framework::LoDTensor> >*)
4 std::_Function_handler<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> (), std::__future_base::_Task_setter<std::unique_ptr<std::__future_base::_Result<unsigned long>, std::__future_base::_Result_base::_Deleter>, unsigned long> >::_M_invoke(std::_Any_data const&)
5 std::__future_base::_State_base::_M_do_set(std::function<std::unique_ptr<std::__future_base::_Result_base, std::__future_base::_Result_base::_Deleter> ()>&, bool&)
6 ThreadPool::ThreadPool(unsigned long)::{lambda()#1}::operator()() const
------------------------------------------
Python Call Stacks (More useful to users):
------------------------------------------
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/framework.py", line 2525, in append_op
attrs=kwargs.get("attrs", None))
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 733, in _init_non_iterable
outputs={'Out': self._feed_list})
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 646, in __init__
self._init_non_iterable()
File "/opt/conda/envs/python35-paddle120-env/lib/python3.7/site-packages/paddle/fluid/reader.py", line 280, in from_generator
iterable, return_list)
File "/home/aistudio/PaddleDetection/ppdet/modeling/architectures/mask_rcnn.py", line 329, in build_inputs
iterable=iterable) if use_dataloader else None
File "PaddleDetection/tools/train.py", line 126, in main
feed_vars, train_loader = model.build_inputs(**inputs_def)
File "PaddleDetection/tools/train.py", line 367, in <module>
main()
----------------------
Error Message Summary:
----------------------
Error: Blocking queue is killed because the data reader raises an exception
[Hint: Expected killed_ != true, but received killed_:1 == true:1.] at (/paddle/paddle/fluid/operators/reader/blocking_queue.h:141)
[operator < read > error]
这是运行的结果
可以在配置中去掉worker_num的设置。 然后调试看下:
对数据调试打印看下。 简单测试程序:
from ppdet.core.workspace import load_config
from ppdet.data.reader import create_reader
config_path='configs/dcn/mask_rcnn_dcn_r101_vd_fpn_1x.yml'
cfg = load_config(config_path)
train_reader = create_reader(cfg.TrainReader, 10)() # 10代表10个iter
for samples in train_reader:
for sample in samples:
print(sample) # sample是image,im_info等组成的tuple
可以更新下检测库的版本,目前检测库中增加了一些脏数据的清理和过滤,包括:数据路径是否存在,box的w,h是否合法的检查。另外也可以去掉work_num试下
@caoweiying123 请问有什么进展吗?
@jerrywgz 因为时间也挺长了,我已经换了个模型,之前的代码也都没了,这个问题我过几天再请教吧,还是多谢啦😁
请教一下,这是不是出现了脏数据的问题,如果是的话,该怎么找出这些脏数据? 或者是还有其他原因,应该如何解决?