In section 3.2, The cost between image regions and words is defined as the alignment scores S = WR⊤ . The bipartite matching problem can then be efficiently solved by the off-the-shelf Hungarian Algorithm.
The input of the Hungarian algorithm is a cost matrix. So, that means alignment score S is the cost matrix? But, why did you call WR⊤ alignment score?
In section 3.2, The cost between image regions and words is defined as the alignment scores S = WR⊤ . The bipartite matching problem can then be efficiently solved by the off-the-shelf Hungarian Algorithm.
The input of the Hungarian algorithm is a cost matrix. So, that means alignment score S is the cost matrix? But, why did you call WR⊤ alignment score?