Problem with NUM_WORKERS

❓ Questions and Help

Hi Kaihua,

thank you for sharing this elegantly-built framework. I have a question when I wanted to test on the VG dataset and custom images with the given example commands (on only single GPU). It's related to the NUM_WORKERS, which is 4 by default and it did not work for me until I set it to 0. The phenomenon was like, the program was stuck at loop of the dataloader without reporting any errors.

Then I was suspecting the problem of mini-batch size. After testing with different values, I found that even by setting the NUM_WORKERS to 0, the command^* for testing on custom images doesn't work with this error:

  File "/home/feng_ji/Documents/Trial_experiments/Scene-Graph-Benchmark.pytorch/maskrcnn_benchmark/modeling/roi_heads/relation_head/model_motifs.py", line 382, in forward
    obj_dists, obj_preds, obj_ctx, perm, inv_perm, ls_transposed = self.obj_ctx(obj_pre_rep, proposals, obj_labels, boxes_per_cls, ctx_average=ctx_average)
  File "/home/feng_ji/Documents/Trial_experiments/Scene-Graph-Benchmark.pytorch/maskrcnn_benchmark/modeling/roi_heads/relation_head/model_motifs.py", line 321, in obj_ctx
    obj_dists, obj_preds = self.decoder_rnn(
  File "/home/feng_ji/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/nn/modules/module.py", line 532, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/feng_ji/Documents/Trial_experiments/Scene-Graph-Benchmark.pytorch/maskrcnn_benchmark/modeling/roi_heads/relation_head/model_motifs.py", line 178, in forward
    assert l_batch == 1
AssertionError
Traceback (most recent call last):
  File "/home/feng_ji/anaconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 194, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "/home/feng_ji/anaconda3/envs/scene_graph_benchmark/lib/python3.8/runpy.py", line 87, in _run_code
    exec(code, run_globals)
  File "/home/feng_ji/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 263, in <module>
    main()
  File "/home/feng_ji/anaconda3/envs/scene_graph_benchmark/lib/python3.8/site-packages/torch/distributed/launch.py", line 258, in main
    raise subprocess.CalledProcessError(returncode=process.returncode,
subprocess.CalledProcessError: Command '['/home/feng_ji/anaconda3/envs/scene_graph_benchmark/bin/python', '-u', 'tools/relation_test_net.py', '--local_rank=0', '--config-file', 'configs
/e2e_relation_X_101_32_8_FPN_1x.yaml', 'MODEL.ROI_RELATION_HEAD.USE_GT_BOX', 'False', 'MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL', 'False', 'MODEL.ROI_RELATION_HEAD.PREDICTOR', 'Causa
lAnalysisPredictor', 'MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE', 'TDE', 'MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE', 'sum', 'MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER', 'motifs', 
'TEST.IMS_PER_BATCH', '2', 'DTYPE', 'float16', 'GLOVE_DIR', '/home_local/feng_ji/SGG/glove', 'MODEL.PRETRAINED_DETECTOR_CKPT', '/home_local/feng_ji/SGG/checkpoints/causal-motifs-sgdet',
 'OUTPUT_DIR', '/home_local/feng_ji/SGG/checkpoints/causal-motifs-sgdet', 'TEST.CUSTUM_EVAL', 'True', 'TEST.CUSTUM_PATH', '/home_local/feng_ji/SGG/checkpoints/custom_images', 'DETECTED_
SGG_DIR', '/home_local/feng_ji/SGG/checkpoints/causal-motifs-sgdet/results']' returned non-zero exit status 1.

But the command with higher mini-batch size for testing on VG can work. Do you have any hints for this? Is this problem system(PCs)-specific?

Best, Jianxiang

CUDA_VISIBLE_DEVICES=0 python -m torch.distributed.launch --master_port 10027 --nproc_per_node=1 tools/relation_test_net.py --config-file "configs/e2e_relation_X_101_32_8_FPN_1x.yaml" MODEL.ROI_RELATION_HEAD.USE_GT_BOX False MODEL.ROI_RELATION_HEAD.USE_GT_OBJECT_LABEL False MODEL.ROI_RELATION_HEAD.PREDICTOR CausalAnalysisPredictor MODEL.ROI_RELATION_HEAD.CAUSAL.EFFECT_TYPE TDE MODEL.ROI_RELATION_HEAD.CAUSAL.FUSION_TYPE sum MODEL.ROI_RELATION_HEAD.CAUSAL.CONTEXT_LAYER motifs TEST.IMS_PER_BATCH 2 DTYPE "float16" GLOVE_DIR /home_local/feng_ji/SGG/glove MODEL.PRETRAINED_DETECTOR_CKPT /home_local/feng_ji/SGG/checkpoints/causal-motifs-sgdet OUTPUT_DIR /home_local/feng_ji/SGG/checkpoints/causal-motifs-sgdet TEST.CUSTUM_EVAL True TEST.CUSTUM_PATH /home_local/feng_ji/SGG/checkpoints/custom_images DETECTED_SGG_DIR /home_local/feng_ji/SGG/checkpoints/causal-motifs-sgdet/results

KaihuaTang / Scene-Graph-Benchmark.pytorch

Problem with NUM_WORKERS #107

❓ Questions and Help