buxiangzhiren / DDCap

MIT License
84 stars 11 forks source link

On length prediction accuracy results in your paper #31

Closed xipq closed 1 year ago

xipq commented 1 year ago

Thank you for your great work.

I have a question regarding the length prediction. In your paper, you report length predicting accuracy of your model. However, there are five candidate captions in COCO which mostly differs in their length. I would like to know that how you evaluated the length predicting accuracy? (like, selecting only one target candidate, treating avg. length as target length, or others?)

Thanks for your reply.

buxiangzhiren commented 1 year ago

We first compute the differences between our predicted length and lengths of five candidate captions. Then we choose the minimun length difference as the accuracy.

xipq commented 1 year ago

Thanks, now I'm clear on that.

I would like to further confirm that during, is the CE loss computed with the length of current sequence selected (1 out of 5), like loss_len = nnf.cross_entropy(len_out, mask.sum(dim=-1).to(torch.long) - 1) in train.py ?

buxiangzhiren commented 1 year ago

Yes, in the process of training, the ground truth is the length of the current caption and the loss is CE loss.

buxiangzhiren commented 1 year ago

If the length variance of your dataset is very large. You can try to predict the mean and variancr of your dataset. Since the mean length of COCO is 11, we predict the length directly.

xipq commented 1 year ago

Thanks for your prompt reply! This is exactly my concern since my dataset is quite tricky on length distributions. Will try this approach on my datasets.