when i test,it takes error

Yx1322441675 commented 3 years ago

(open-mmlab) goo@goo-Z390-GAMING-X:~/yx/AlignPS$ sh run_test.sh loading annotations into memory... Done (t=0.12s) creating index... index created! /home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/cnn/bricks/conv_module.py:100: UserWarning: ConvModule has norm and bias at the same time warnings.warn('ConvModule has norm and bias at the same time') [ ] 0/6978, elapsed: 0s, ETA:/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:3328: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.") /home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:3458: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode) Traceback (most recent call last): File "./tools/test.py", line 226, in main() File "./tools/test.py", line 187, in main args.gpu_collect) File "/home/goo/yx/AlignPS/mmdet/apis/test.py", line 98, in multi_gpu_test result = model(return_loss=False, rescale=True, data) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(*input, **kwargs) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/parallel/distributed.py", line 705, in forward output = self.module(inputs[0], kwargs[0]) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl result = self.forward(input, **kwargs) File "/home/goo/yx/AlignPS/mmdet/core/fp16/decorators.py", line 51, in new_func return old_func(args, kwargs) File "/home/goo/yx/AlignPS/mmdet/models/detectors/base.py", line 170, in forward return self.forward_test(img, img_metas, kwargs) File "/home/goo/yx/AlignPS/mmdet/models/detectors/base.py", line 147, in forward_test return self.simple_test(imgs[0], img_metas[0], kwargs) File "/home/goo/yx/AlignPS/mmdet/models/detectors/single_stage_reid.py", line 118, in simple_test outs, img_metas, rescale=rescale) File "/home/goo/yx/AlignPS/mmdet/core/fp16/decorators.py", line 131, in new_func return old_func(*args, kwargs) File "/home/goo/yx/AlignPS/mmdet/models/dense_heads/fcos_reid_head_focal_sub_triqueue.py", line 454, in get_bboxes img_shape = img_metas[img_id]['img_shape'] TypeError: 'DataContainer' object is not subscriptable Killing subprocess 6428 Traceback (most recent call last): File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 340, in main() File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 326, in main sigkill_handler(signal.SIGTERM, None) # not coming back File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 301, in sigkill_handler raise subprocess.CalledProcessError(returncode=last_return_code, cmd=cmd) subprocess.CalledProcessError: Command '['/home/goo/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/test.py', '--local_rank=0', './configs/fcos/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0.py', 'work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/epoch_24.pth', '--launcher', 'pytorch', '--out', 'work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/results_1000.pkl']' returned non-zero exit status 1.

Traceback (most recent call last): File "./tools/test_results.py", line 75, in with open(os.path.join(results_path, 'results_1000.pkl'), 'rb') as fid: FileNotFoundError: [Errno 2] No such file or directory: '/home/goo/yx/AlignPS/work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/results_1000.pkl' fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0 I don't know why .Maybe you can help me .Thank you !

daodaofr commented 3 years ago

Hi,

It seems to be the problem with mmcv version. Could you please try mmcv=1.1.5 and pytorch=1.7.0 ?

Yx1322441675 commented 3 years ago

Hi,

It seems to be the problem with mmcv version. Could you please try mmcv=1.1.5 and pytorch=1.7.0 ?

The following is my setting.I have changed the version.But it still takes error .Maybe you can help me Package Version Location

addict 2.4.0 certifi 2020.12.5 cycler 0.10.0 Cython 0.29.22 joblib 1.0.1 kiwisolver 1.3.1 matplotlib 3.4.0 mkl-fft 1.3.0 mkl-random 1.1.1 mkl-service 2.3.0 mmcv-full 1.1.5 mmdet 2.4.0 /home/goo/yx/AlignPS mmpycocotools 12.0.3 numpy 1.19.2 olefile 0.46 opencv-python 4.5.1.48 Pillow 8.1.2 pip 21.0.1 pyparsing 2.4.7 python-dateutil 2.8.1 PyYAML 5.4.1 scikit-learn 0.24.1 scipy 1.6.2 setuptools 52.0.0.post20210125 six 1.15.0 sklearn 0.0 terminaltables 3.1.0 threadpoolctl 2.1.0 torch 1.7.0 torchaudio 0.7.0a0+ac17b64 torchvision 0.8.0 typing-extensions 3.7.4.3 wheel 0.36.2 yapf 0.31.0

(open-mmlab) goo@goo-Z390-GAMING-X:~/yx/AlignPS$ sh run_test.sh Traceback (most recent call last): File "./tools/test.py", line 226, in main() File "./tools/test.py", line 146, in main init_dist(args.launcher, cfg.dist_params) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 17, in init_dist _init_dist_pytorch(backend, kwargs) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/runner/dist_utils.py", line 31, in _init_dist_pytorch dist.init_process_group(backend=backend, kwargs) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 442, in init_process_group barrier() File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/distributed_c10d.py", line 1947, in barrier work = _default_pg.barrier() RuntimeError: CUDA error: out of memory Traceback (most recent call last): File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main**", mod_spec) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/goo/anaconda3/envs/open-mmlab/bin/python', '-u', './tools/test.py', '--local_rank=0', './configs/fcos/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0.py', 'work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/epoch_24.pth', '--launcher', 'pytorch', '--out', 'work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/results_1000.pkl']' returned non-zero exit status 1.

Traceback (most recent call last): File "./tools/test_results.py", line 75, in with open(os.path.join(results_path, 'results_1000.pkl'), 'rb') as fid: FileNotFoundError: [Errno 2] No such file or directory: '/home/goo/yx/AlignPS/work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/results_1000.pkl' fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0

Yx1322441675 commented 3 years ago

And i don't know where is the file "results_1000.pkl"?My training results don't have it.Maybe i should change it?Maybe it actually doesn't exist?

daodaofr commented 3 years ago

Now, your actual issue is " RuntimeError: CUDA error: out of memory", please make sure you have enough GPU memory. It needs about 4G memory on GPU.

The "results_1000.pkl" will be generated if you successfully run "./tools/dist_test.sh", it will be saved in "work_dirs/${TESTPATH}/results_1000.pkl"

daodaofr commented 3 years ago

The model name is not right, I've updated the test script, please try.

Yx1322441675 commented 3 years ago

I train the dataset PRW ,and test the trained model ,but it takes error .when i test the pretrained model you provided,it tests successfully.Maybe you can help me ? (open-mmlab) goo@goo-Z390-GAMING-X:~/yx/AlignPS$ sh run_test_prw.sh loading annotations into memory... Done (t=0.07s) creating index... index created! /home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/mmcv/cnn/bricks/conv_module.py:100: UserWarning: ConvModule has norm and bias at the same time warnings.warn('ConvModule has norm and bias at the same time') [ ] 0/6112, elapsed: 0s, ETA:/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:2952: UserWarning: nn.functional.upsample is deprecated. Use nn.functional.interpolate instead. warnings.warn("nn.functional.upsample is deprecated. Use nn.functional.interpolate instead.") /home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/torch/nn/functional.py:3063: UserWarning: Default upsampling behavior when mode=bilinear is changed to align_corners=False since 0.4.0. Please specify align_corners=True if the old behavior is desired. See the documentation of nn.Upsample for details. "See the documentation of nn.Upsample for details.".format(mode)) /home/goo/yx/AlignPS/mmdet/core/post_processing/bbox_nms_reid.py:55: UserWarning: This overload of nonzero is deprecated: nonzero() Consider using one of the following signatures instead: nonzero(*, bool as_tuple) (Triggered internally at /opt/conda/conda-bld/pytorch_1603729006826/work/torch/csrc/utils/python_arg_parser.cpp:882.) labels = valid_mask.nonzero()[:, 1] [>>>>>>>>>>>>>>>>>>>>>>>>>>>] 6112/6112, 15.0 task/s, elapsed: 406s, ETA: 0s writing results to work_dirs/prw_base_focal_labelnorm_sub_ldcn_fg15_wd1-3/results_1000.pkl

Traceback (most recent call last): File "./tools/test_results_prw.py", line 313, in main(det_thresh=0.15, input_path=sys.argv[1]) File "./tools/test_results_prw.py", line 144, in main feat = normalize(det[0][:, 5:], axis=1) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f return f(*args, *kwargs) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/sklearn/preprocessing/_data.py", line 1905, in normalize estimator='the normalize function', dtype=FLOAT_DTYPES) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/sklearn/utils/validation.py", line 63, in inner_f return f(args, **kwargs) File "/home/goo/anaconda3/envs/open-mmlab/lib/python3.7/site-packages/sklearn/utils/validation.py", line 672, in check_array context)) ValueError: Found array with 0 sample(s) (shape=(0, 256)) while a minimum of 1 is required by the normalize function.

daodaofr commented 3 years ago

It seems some of your detection results are empty. Could you please check the detection results in results_1000.pkl?

I also add a judgment here: https://github.com/daodaofr/AlignPS/blob/c20cf329b2934a8693e2064435d3e3f65c496095/tools/test_results_prw.py#L144

You may try it.

jiabeiwangTJU commented 3 years ago

I also got the similar problem, and mmcv=1.1.5, pytorch=1.7.0 are correct, but I don't know why.

Traceback (most recent call last): File "./tools/test.py", line 226, in main() File "./tools/test.py", line 167, in main checkpoint = load_checkpoint(model, args.checkpoint, map_location='cpu') File "/home/ms/wjb/AlignPS/mmcv/mmcv/runner/checkpoint.py", line 247, in load_checkpoint checkpoint = _load_checkpoint(filename, map_location) File "/home/ms/wjb/AlignPS/mmcv/mmcv/runner/checkpoint.py", line 222, in _load_checkpoint raise IOError(f'{filename} is not a checkpoint file') OSError: work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/24.pth is not a checkpoint file Traceback (most recent call last): File "/home/ms/anaconda3/envs/alignps/lib/python3.7/runpy.py", line 193, in _run_module_as_main "main", mod_spec) File "/home/ms/anaconda3/envs/alignps/lib/python3.7/runpy.py", line 85, in _run_code exec(code, run_globals) File "/home/ms/anaconda3/envs/alignps/lib/python3.7/site-packages/torch/distributed/launch.py", line 260, in main() File "/home/ms/anaconda3/envs/alignps/lib/python3.7/site-packages/torch/distributed/launch.py", line 256, in main cmd=cmd) subprocess.CalledProcessError: Command '['/home/ms/anaconda3/envs/alignps/bin/python', '-u', './tools/test.py', '--local_rank=0', './configs/fcos/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0.py', 'work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/24.pth', '--launcher', 'pytorch', '--out', 'work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/results_1000.pkl']' returned non-zero exit status 1.

Traceback (most recent call last): File "./tools/test_results.py", line 75, in with open(os.path.join(results_path, 'results_1000.pkl'), 'rb') as fid: FileNotFoundError: [Errno 2] No such file or directory: '/home/ms/wjb/AlignPS/work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/results_1000.pkl' fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0

daodaofr commented 3 years ago

@jiabeiwangTJU

Hi, your isuue is "work_dirs/fcos_center-normbbox-centeronreg-giou_r50_caffe_fpn_gn-head_dcn_4x4_1x_cuhk_reid_1500_stage1_fpncat_dcn_epoch24_multiscale_focal_x4_bg-2_lconv3dcn_sub_triqueue_dcn0/24.pth is not a checkpoint file".

Could you check your checkpoint path or try to load it manually.

jiabeiwangTJU commented 3 years ago

In my checkpoint path, there is a "latest.pth". So I change the TESTNAME='cuhk_alignps.pth' into 'latest.pth', that works. Thanks.

daodaofr / AlignPS

when i test,it takes error #1