Closed fawazsammani closed 5 years ago
Hi @fawazsammani, thanks for your interest!
1 - Right, thanks a lot for spotting this! We will correct the pre-print asap.
2 - While z_t^r
is indeed a vector (attention scores for each region set) z_t^c
is a single scalar (compatibility score between the hidden state and the sentinel vector). So we simply sum the scalar in Eq. 6.
Thank you @baraldilorenzo ! And good luck for your presentation
Hi. I've read through your paper, and it's very interesting. Congrats on that amazing work! There are a few doubts, appreciate your kind help. I haven't gone through the code yet, so if question 2 and 3 are related to the implementation aspect, please ignore them. 1- In equation 11 (objective), is there a typo for chunk-level probability? According to my understanding, your switching gate is a Boolean (0,1). This is equivalent to binary cross entropy loss, so I assume it should be
log(1-p)
rather than1-log(p)
? 2- For equation 6, you are taking the normalized exponential ofzt
. You are dividing by the sum of each element of ztr added with the vector ztc. Isn't it supposed to be added with the sum of each element of vector ztc rather than adding with the vector ztc? Thanks you, and wish you all the best for your CVPR presentation!