SydCaption / SAAT

MIT License
62 stars 21 forks source link

Seq and pos are not consistent? #31

Closed RyanLiut closed 3 years ago

RyanLiut commented 3 years ago

Hi,

I found that the seq and pos, that is the GT caption and the corresponding GT pos, within each batch are not consistent one-by-one. In the dataloader, I found the number of captions and pos for the same video is even different. Shouldn't they be the same and one-to-one consistent?

Thank you!

SydCaption commented 3 years ago

There are multiple GT captions for each video, each of which can be used for training. I don't think this should be limited to exactly one-to-one. But the alignment problem you point out does exist.