Hi, Thanks for your great work! But I have some problems with the loss function in your code. First, in the original paper, the author said he used the logistic regression loss function, but in your code, it seems you only calculate the positive and negative pair loss between the sentence and the image, Second, I wonder which task your code focus on, because in the original paper, it focus on the phrase grounding, however, in your code, it seems you didn't deal with the phrase in the caption, instead you treated the caption as a whole, could you give a little bit explanation about this?
Hi. Thank you for the excellent work. Could I ask you a question about your loss function? Although the code works fine, the loss value is always nan. Is there something wrong?
Hi, Thanks for your great work! But I have some problems with the loss function in your code. First, in the original paper, the author said he used the logistic regression loss function, but in your code, it seems you only calculate the positive and negative pair loss between the sentence and the image, Second, I wonder which task your code focus on, because in the original paper, it focus on the phrase grounding, however, in your code, it seems you didn't deal with the phrase in the caption, instead you treated the caption as a whole, could you give a little bit explanation about this?