error when test vcoco - Githubissues

leijue222 commented 2 years ago

I use python main.py --cache --dataset vcoco --data-root vcoco/ --partitions trainval test --output-dir vcoco-r50 --resume checkpoints/upt-r50-vcoco.pt to generate cache.pkl. But report a error when eval it.

The eval code is:

from vsrl_eval import VCOCOeval

vsrl_annot_file = 'data/vcoco/vcoco_val.json'
coco_file = 'data/instances_vcoco_all_2014.json'
split_file = 'data/splits/vcoco_val.ids'

vcocoeval = VCOCOeval(vsrl_annot_file, coco_file, split_file)

det_file = '/media/ming-t/Deng/relation_mppe/HOI-UPT/vcoco-r50/cache.pkl'
vcocoeval._do_eval(det_file, ovr_thresh=0.5)

The error is:

loading annotations into memory...
Done (t=0.74s)
creating index...
index created!
loading vcoco annotations...
Traceback (most recent call last):
  File "test.py", line 14, in <module>
    vcocoeval._do_eval(det_file, ovr_thresh=0.5)
  File "/media/ming-t/Deng/relation_mppe/HOI-UPT/lib/vcoco/vsrl_eval.py", line 194, in _do_eval
    self._do_agent_eval(vcocodb, detections_file, ovr_thresh=ovr_thresh)
  File "/media/ming-t/Deng/relation_mppe/HOI-UPT/lib/vcoco/vsrl_eval.py", line 417, in _do_agent_eval
    assert(np.amax(rec) <= 1)
  File "<__array_function__ internals>", line 180, in amax
  File "/home/ming-t/anaconda3/envs/pocket/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2793, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/home/ming-t/anaconda3/envs/pocket/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

How to solve it?

fredzzhang commented 2 years ago

Hi @leijue222,

The error ValueError: zero-size array to reduction operation maximum which has no identity suggests that rec is an empty array. Thus taking the maximum element of an empty array is not supported.

The most likely reason for that is the detections not being cached correctly. Can you manually load the cache.pkl file and check what's inside? It should contain a list of dictionaries. In particular you need to check if each dictionary is empty.

If somehow those results are empty, the checkpoint might have been loaded incorrectly. Make sure you check its path. If the checkpoint is loaded correctly, you should see Continue from saved checkpoint ./checkpoints/upt-r50-vcoco.pt

Fred.

leijue222 commented 2 years ago

The checkpoint seems no problem. I test UPT-R50 on HICO-DET was right.

The default_facctory is None. But I have no idea why?

After debug, it is found that the default_factory is None after CacheTemplate processing, but the passed parameter is not None. I'm a little helpless.

leijue222 commented 2 years ago

I try to test Cache detection results from UPT-R101 for evaluation on V-COCO. But the cache.pkl still with the zero-size error when eval.

$ python main.py --cache --dataset vcoco --data-root vcoco/ --partitions trainval test --backbone resnet101 --output-dir vcoco-r101 --resume checkpoints/upt-r101-vcoco.pt
Namespace(alpha=0.5, aux_loss=True, backbone='resnet101', batch_size=2, bbox_loss_coef=5, box_score_thresh=0.2, cache=True, clip_max_norm=0.1, data_root='vcoco/', dataset='vcoco', dec_layers=6, device='cuda', dilation=False, dim_feedforward=2048, dropout=0.1, enc_layers=6, eos_coef=0.1, epochs=20, eval=False, fg_iou_thresh=0.5, gamma=0.2, giou_loss_coef=2, hidden_dim=256, lr_drop=10, lr_head=0.0001, max_instances=15, min_instances=3, nheads=8, num_queries=100, num_workers=2, output_dir='vcoco-r101', partitions=['trainval', 'test'], port='1234', position_embedding='sine', pre_norm=False, pretrained='', print_interval=500, repr_dim=512, resume='checkpoints/upt-r101-vcoco.pt', sanity=False, seed=66, set_cost_bbox=5, set_cost_class=1, set_cost_giou=2, weight_decay=0.0001, world_size=1)
Downloading: "https://download.pytorch.org/models/resnet101-63fe2227.pth" to /home/ming-t/.cache/torch/hub/checkpoints/resnet101-63fe2227.pth
100%|████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 171M/171M [00:20<00:00, 8.72MB/s]
=> Rank 0: continue from saved checkpoint checkpoints/upt-r101-vcoco.pt
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 4532/4532 [04:01<00:00, 18.76it/s]

fredzzhang commented 2 years ago

Hi @leijue222,

The error originates from the method _do_agent_eval, which is not needed for this task anyways. You can just remove that process. To be specific, in vsrl_eval.py, line 192 is computing the action recognition accuracy, while line 193 and 194 are computing the HOI mAP under scenario 1 and 2. So just comment out line 192 and see if it works.

192    self._do_agent_eval(vcocodb, detections_file, ovr_thresh=ovr_thresh)
193    self._do_role_eval(vcocodb, detections_file, ovr_thresh=ovr_thresh, eval_type='scenario_1')
194    self._do_role_eval(vcocodb, detections_file, ovr_thresh=ovr_thresh, eval_type='scenario_2')

Fred.

leijue222 commented 2 years ago

Thanks Fred. I remove line 192 of _do_agent_eval. But meet the same error:

loading vcoco annotations...
Traceback (most recent call last):
  File "test.py", line 14, in <module>
    vcocoeval._do_eval(det_file, ovr_thresh=0.5)
  File "/media/ming-t/Deng/relation_mppe/HOI-UPT/lib/vcoco/vsrl_eval.py", line 195, in _do_eval
    self._do_role_eval(vcocodb, detections_file, ovr_thresh=ovr_thresh, eval_type='scenario_1')
  File "/media/ming-t/Deng/relation_mppe/HOI-UPT/lib/vcoco/vsrl_eval.py", line 318, in _do_role_eval
    assert(np.amax(rec) <= 1)
  File "<__array_function__ internals>", line 180, in amax
  File "/home/ming-t/anaconda3/envs/pocket/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 2793, in amax
    return _wrapreduction(a, np.maximum, 'max', axis, None, out,
  File "/home/ming-t/anaconda3/envs/pocket/lib/python3.8/site-packages/numpy/core/fromnumeric.py", line 86, in _wrapreduction
    return ufunc.reduce(obj, axis, dtype, out, **passkwargs)
ValueError: zero-size array to reduction operation maximum which has no identity

fredzzhang commented 2 years ago

The problem could be resulted from two parts, either the UPT caching code or the VCOCO evaluation code. I've run the caching script myself and attached the cached detections in a link to google drive. You can try to see if this file causes the same error.

Fred.

leijue222 commented 2 years ago

We can initially determine that it is not the problem of the UPT caching code. As show in this image, my vcoco-r50/cache.pkl and your cache_fred.pkl

The error still here:

fredzzhang commented 2 years ago

I was able to run evaluation on the cached detections I've sent you. So probably the v-coco repo wasn't set up correctly on your end. I'd recommend doing a clean installation again and see if the error persists.

Fred.

leijue222 commented 2 years ago

I reinstall v-coco again, the *.so, bbox.c, _mask.c etc. file are compiled normally, and there is no problem in installation. it still report error of zero-size.

helpless...

The pkl file I upload to here.

fredzzhang commented 2 years ago

Perhaps it's the environment for the V-COCO. Since the repository itself was developed using Python2, you could try to create a new environment in Python2 and install V-COCO that way.

leijue222 commented 2 years ago

Figured it out! Reffer to https://github.com/fredzzhang/upt/discussions/14#discussion-3718325， add a new utils.py could be worked. Although I don't understand why utils.py introduced under UPT through sys.path.append doesn't work.

Thanks, Fred.

leijue222 commented 2 years ago

One more question about the eval result.

The cache_fred.pkl your provied I test the result is scenario_1 AP = 56.61 | scenario_2 AP = 61.90
The evaluation result of the UPT-R50-VCOCO model I downloaded through Model Zoo is scenario_1 AP = 56.58 | scenario_2 AP = 61.88

Both of these results are lower than the scenario 1 AP=59.0 and scenario 2 AP=64.5 results in the paper. Is this problem because the model of model zoo is not the best model?

Loading cached results from ./cache_fred.pkl.
loading annotations into memory...
Done (t=0.76s)
creating index...
index created!
loading vcoco annotations...
---------Reporting Role AP (%)------------------
           hold-obj: AP = 50.27 (#pos = 3608)
          sit-instr: AP = 38.27 (#pos = 1916)
         ride-instr: AP = 75.36 (#pos = 556)
           look-obj: AP = 48.04 (#pos = 3347)
          hit-instr: AP = 80.62 (#pos = 349)
            hit-obj: AP = 51.65 (#pos = 349)
            eat-obj: AP = 52.23 (#pos = 521)
          eat-instr: AP = 9.95 (#pos = 521)
         jump-instr: AP = 55.56 (#pos = 635)
          lay-instr: AP = 42.89 (#pos = 387)
talk_on_phone-instr: AP = 53.28 (#pos = 285)
          carry-obj: AP = 49.17 (#pos = 472)
          throw-obj: AP = 55.33 (#pos = 244)
          catch-obj: AP = 53.29 (#pos = 246)
          cut-instr: AP = 55.77 (#pos = 269)
            cut-obj: AP = 55.48 (#pos = 269)
work_on_computer-instr: AP = 76.40 (#pos = 410)
          ski-instr: AP = 55.55 (#pos = 424)
         surf-instr: AP = 85.28 (#pos = 486)
   skateboard-instr: AP = 92.40 (#pos = 417)
        drink-instr: AP = 64.33 (#pos = 82)
           kick-obj: AP = 77.54 (#pos = 180)
        point-instr: AP = 0.00 (#pos = 31)
           read-obj: AP = 51.61 (#pos = 111)
    snowboard-instr: AP = 84.93 (#pos = 277)
Average Role [scenario_1] AP = 56.61
---------------------------------------------
---------Reporting Role AP (%)------------------
           hold-obj: AP = 62.05 (#pos = 3608)
          sit-instr: AP = 49.87 (#pos = 1916)
         ride-instr: AP = 77.62 (#pos = 556)
           look-obj: AP = 58.54 (#pos = 3347)
          hit-instr: AP = 84.71 (#pos = 349)
            hit-obj: AP = 58.85 (#pos = 349)
            eat-obj: AP = 63.13 (#pos = 521)
          eat-instr: AP = 20.03 (#pos = 521)
         jump-instr: AP = 56.14 (#pos = 635)
          lay-instr: AP = 43.42 (#pos = 387)
talk_on_phone-instr: AP = 58.26 (#pos = 285)
          carry-obj: AP = 51.38 (#pos = 472)
          throw-obj: AP = 56.82 (#pos = 244)
          catch-obj: AP = 58.84 (#pos = 246)
          cut-instr: AP = 61.83 (#pos = 269)
            cut-obj: AP = 61.98 (#pos = 269)
work_on_computer-instr: AP = 79.45 (#pos = 410)
          ski-instr: AP = 64.57 (#pos = 424)
         surf-instr: AP = 87.56 (#pos = 486)
   skateboard-instr: AP = 95.10 (#pos = 417)
        drink-instr: AP = 64.64 (#pos = 82)
           kick-obj: AP = 85.19 (#pos = 180)
        point-instr: AP = 0.00 (#pos = 31)
           read-obj: AP = 61.39 (#pos = 111)
    snowboard-instr: AP = 86.21 (#pos = 277)
Average Role [scenario_2] AP = 61.90
--------------------------------------------

fredzzhang commented 2 years ago

Hi @leijue222,

I'm glad you solved the problem. Regarding the performance disparity, refer to step 4 of the same thread.

Fred.

leijue222 commented 2 years ago

I got it. Thank you~

fredzzhang / upt

error when test vcoco #59