aimagelab / show-control-and-tell

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
https://arxiv.org/abs/1811.10652
BSD 3-Clause "New" or "Revised" License
282 stars 61 forks source link

Controllability through a set of detections #25

Open atg93 opened 4 years ago

atg93 commented 4 years ago

Hi , Could you please give more information about Figure 4 in the paper ? In my understanding, you choose regions based ground-truth captions in the dataset for Controllability through a sequence of detections. In experiment of figure 4, how do you choose a set of regions for an image ?

amirunxiayang commented 3 years ago

i have the same question