aimagelab / show-control-and-tell

Show, Control and Tell: A Framework for Generating Controllable and Grounded Captions. CVPR 2019
https://arxiv.org/abs/1811.10652
BSD 3-Clause "New" or "Revised" License
282 stars 62 forks source link

some questions regarding the paper #2

Closed fawazsammani closed 5 years ago

fawazsammani commented 5 years ago

Hi. I've read through your paper, and it's very interesting. Congrats on that amazing work! There are a few doubts, appreciate your kind help. I haven't gone through the code yet, so if question 2 and 3 are related to the implementation aspect, please ignore them. 1- In equation 11 (objective), is there a typo for chunk-level probability? According to my understanding, your switching gate is a Boolean (0,1). This is equivalent to binary cross entropy loss, so I assume it should be log(1-p) rather than 1-log(p)? 2- For equation 6, you are taking the normalized exponential of zt. You are dividing by the sum of each element of ztr added with the vector ztc. Isn't it supposed to be added with the sum of each element of vector ztc rather than adding with the vector ztc? Thanks you, and wish you all the best for your CVPR presentation!

baraldilorenzo commented 5 years ago

Hi @fawazsammani, thanks for your interest! 1 - Right, thanks a lot for spotting this! We will correct the pre-print asap. 2 - While z_t^r is indeed a vector (attention scores for each region set) z_t^c is a single scalar (compatibility score between the hidden state and the sentinel vector). So we simply sum the scalar in Eq. 6.

fawazsammani commented 5 years ago

Thank you @baraldilorenzo ! And good luck for your presentation