chenhang98 / BPR

code for `Look Closer to Segment Better: Boundary Patch Refinement for Instance Segmentation`
Apache License 2.0
173 stars 23 forks source link

test_float be killed #9

Closed Dinghow closed 3 years ago

Dinghow commented 3 years ago

Thanks for your excellent work on boundary refinement. When I'm reproducing your experiment based on your codebase & data, I encounter some problem during inference. I execute the inference.sh script as your instruction in README, and the dist_test_float.sh is always be killed after inference without saving refined patch. Could you please give some suggestions? Thanks a lot.

# yangdinghao @ dev-yangdinghao in /sensebee2/yangdinghao/BPR on git:main x [23:21:06] C:130
$ IOU_THRESH=0.55 \ 
IMG_DIR=/sensebee2/data/segmentation/mattingseg/cityscapes/leftImg8bit/val \
GT_JSON=/sensebee2/data/segmentation/mattingseg/cityscapes/annotations/instancesonly_filtered_gtFine_val.json \
BPR_ROOT=. \
GPUS=1 \
sh tools/inference.sh configs/bpr/hrnet18s_128.py ckpts/hrnet18s_128-24055c80.pth data/maskrcnn_val maskrcnn_val_refined                                                                                                                                                                        
+ GREEN='\033[0;32m'
+ END='\033[0m\n'
+ printf '\033[0;32minference the network ...\033[0m\n'
inference the network ...
+ DATA_ROOT=maskrcnn_val_refined/patches
+ bash ./tools/dist_test_float.sh configs/bpr/hrnet18s_128.py ckpts/hrnet18s_128-24055c80.pth 1 --out maskrcnn_val_refined/refined.pkl
2021-06-23 23:21:23,985 - mmseg - INFO - Loaded 172518 images
libibverbs: Warning: couldn't open config directory '/etc/libibverbs.d'.
libibverbs: Warning: no userspace device-specific driver found for /sys/class/infiniband_verbs/uverbs0
[>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>] 172518/172518, 36.9 task/s, elapsed: 4679s, ETA:     0s./tools/dist_test_float.sh: line 9: 29810 Killed                  PYTHONPATH="$(dirname $0)/..":$PYTHONPATH python -m torch.distributed.launch --nproc_per_node=$GPUS --master_port=$PORT $(dirname "$0")/test_float.py $CONFIG $CHECKPOINT --launcher pytorch ${@:4}+ printf '\033[0;32mreassemble ...\033[0m\n'
reassemble ...
+ python ./tools/merge_patches.py maskrcnn_val_refined/coarse.json /sensebee2/data/segmentation/mattingseg/cityscapes/annotations/instancesonly_filtered_gtFine_val.json maskrcnn_val_refined/refined.pkl maskrcnn_val_refined/patches/detail_dir/val maskrcnn_val_refined/refined.json
loading annotations into memory...
Done (t=0.06s)
creating index...
index created!
Traceback (most recent call last):
  File "./tools/merge_patches.py", line 104, in <module>
    start()
  File "./tools/merge_patches.py", line 63, in start
    with open(args.res_pkl, 'rb') as f:
FileNotFoundError: [Errno 2] No such file or directory: 'maskrcnn_val_refined/refined.pkl'
+ printf '\033[0;32mconvert to cityscape format ...\033[0m\n'
convert to cityscape format ...
+ python ./tools/json2cityscapes.py maskrcnn_val_refined/refined.json /sensebee2/data/segmentation/mattingseg/cityscapes/annotations/instancesonly_filtered_gtFine_val.json maskrcnn_val_refined/refined
Traceback (most recent call last):
  File "./tools/json2cityscapes.py", line 67, in <module>
    Fire(main)
  File "/sensebee2/yangdinghao/anaconda3/envs/mmseg/lib/python3.7/site-packages/fire/core.py", line 141, in Fire
    component_trace = _Fire(component, args, parsed_flag_args, context, name)
  File "/sensebee2/yangdinghao/anaconda3/envs/mmseg/lib/python3.7/site-packages/fire/core.py", line 471, in _Fire
    target=component.__name__)
  File "/sensebee2/yangdinghao/anaconda3/envs/mmseg/lib/python3.7/site-packages/fire/core.py", line 681, in _CallAndUpdateTrace
    component = fn(*varargs, **kwargs)
  File "./tools/json2cityscapes.py", line 53, in main
    imgid2dt = load_dt(dt_json)
  File "./tools/json2cityscapes.py", line 28, in load_dt
    dt = json.load(open(dt_json))
FileNotFoundError: [Errno 2] No such file or directory: 'maskrcnn_val_refined/refined.json'
chenhang98 commented 3 years ago

This seems to be due to (CPU) memory overflow, can you try a larger memory?

chenhang98 commented 3 years ago

We tested these codes on a machine with 188G memory

Dinghow commented 3 years ago

We tested these codes on a machine with 188G memory

Thanks! The problem is due to mmcv.dump need a huge memory quota since the size of data is large, and I solved it by using a compute pod with 120g mem.

usherbob commented 2 years ago

We tested these codes on a machine with 188G memory

Thanks! The problem is due to mmcv.dump need a huge memory quota since the size of data is large, and I solved it by using a compute pod with 120g mem.

So many images generated. I tried to test results on coco Val-2017 data, and it generates about 2309455 images/patches, and my server cannot deal with it.