Closed xipq closed 1 year ago
We first compute the differences between our predicted length and lengths of five candidate captions. Then we choose the minimun length difference as the accuracy.
Thanks, now I'm clear on that.
I would like to further confirm that during, is the CE loss computed with the length of current sequence selected (1 out of 5), like loss_len = nnf.cross_entropy(len_out, mask.sum(dim=-1).to(torch.long) - 1)
in train.py
?
Yes, in the process of training, the ground truth is the length of the current caption and the loss is CE loss.
If the length variance of your dataset is very large. You can try to predict the mean and variancr of your dataset. Since the mean length of COCO is 11, we predict the length directly.
Thanks for your prompt reply! This is exactly my concern since my dataset is quite tricky on length distributions. Will try this approach on my datasets.
Thank you for your great work.
I have a question regarding the length prediction. In your paper, you report length predicting accuracy of your model. However, there are five candidate captions in COCO which mostly differs in their length. I would like to know that how you evaluated the length predicting accuracy? (like, selecting only one target candidate, treating avg. length as target length, or others?)
Thanks for your reply.