Hi ,
why you used
"img_emb_inst = img_emb[2]
cap_emb_inst = cap_emb[2] "
in the line 96,97 of evaluation.py???
In my opinion, the instance-level features is img_emb[0] and cap_emb[0] ???????
because in Model.py, the code is "emb_v = torch.stack((instance_emb_v, consensus_emb_v, fused_emb_v), dim=0)".
Hi, this configuration is setted by emperically finding it brought about slight performance improvement when the instance-level features are replaced by the fused features in my subsequent experiments.
Hi , why you used "img_emb_inst = img_emb[2] cap_emb_inst = cap_emb[2] " in the line 96,97 of evaluation.py??? In my opinion, the instance-level features is img_emb[0] and cap_emb[0] ???????
because in Model.py, the code is "emb_v = torch.stack((instance_emb_v, consensus_emb_v, fused_emb_v), dim=0)".
this confused me a lot. Hope your reply.