Closed truetone2022 closed 4 years ago
Hello, Thanks for this excellent catch. For the V-COCO dataset the first object is always [0,0,0,0] which refers to no-object case as V-COCO has a restriction of detecting no object case as [0,0,0,0] (For details). There is no point in including a no-object case in the graph structure, so the slicing happened. HICO-DET dataset doesn't have this restriction. So there is no need for slicing in this case. While cleaning up the repo I think I forgot to notice this part of the code. This basically forces the network to ignore the first object in the graph structure. I pushed a quick fix to the issue. I will try to clean it up when I got the chance.
Just to add, intuitively with this fix if we retrain the network we should get a little bit better results than the results reported in the paper. I just ran the inference with the model we reported in the paper and got a similar result:
Full--19.79 Rare--16.19 Non-Rare--20.87
I will try to report the new results once I retrain the whole network with the bug fixed.
Thanks for your very helpful reply! And after reading your great work, I have a few confusing questions about VSGNet:
Thanks for your interest in our work,
1) Well, we tried adding pose estimation as the 3rd channel in our spatial map, the little improvement in the result does not justify the huge overhead. But I think there are other works which have used pose in a different way. I personally have little reservations about adding more computationally expensive backends, most existing methods already use object detectors offline, adding pose will make it more complicated. But yes pose information should help on the HOI task.
2) I am not familiar with TreeLSTM but Gated GCN can certainly help.
@ASMIftekhar hello, I recently tried to do some experiments on your project. Which model is used to extract the pose information you said? please help, thank you.
Alpha Pose: https://github.com/MVIG-SJTU/AlphaPose
The op in the script_hico/model.py, for batch_num,l in enumerate(pairs_info):
Slicing