YunseokJANG / tgif-qa

Repository for our CVPR 2017 and IJCV: TGIF-QA
https://arxiv.org/abs/1704.04497
168 stars 31 forks source link

Mismatched Q&A pairs and GIF datasets #19

Closed Fly2flies closed 3 years ago

Fly2flies commented 3 years ago

Hello,

Thank you for your excellent work!

When I download the tgif-qa dataset, which includes approximately 124G of GIF files(9 zip splits) and some csv files with question and answer pairs, I find some gif_name in the csv files can't be found in the GIFs dataset. such as the tumblr_nk172bbdPI1u1lr18o1_250 in the Test_action_question.csv.

Meanwhile, some tgif file can't be found in the csv file, such as the tumblr_l5zke1pg6r1qzzqaxo1_500.gif.

Have you ever encountered this problem? I downloaded the wrong data set ?

YunseokJANG commented 3 years ago

Hi @Fly2flies , First of all, thanks for your interest in our project, but sorry for a bit of delay: I just finished re-do all the process, from downloading each file from the Box link, to give a clearer answer.

First of all, fortunately, or unfortunately, in the final tgif_full.zip file, I can clearly see "gifs/tumblr_l5zke1pg6r1qzzqaxo1_500.gif" under "gifs" folder.

$ unzip -l tgif_full.zip | grep tumblr_l5zke1pg6r1qzzqaxo1_500
> 511488  2010-07-22 21:06   gifs/tumblr_l5zke1pg6r1qzzqaxo1_500.gif

I can imagine three possible scenarios if you failed to find this file: (a) You've unzipped the original 'tgif.zip' file only, by mistake. If this is the case, then please check the readme.txt file to handle it properly. (b) MD5 of your downloaded chunk (md5sum in Linux or md5 in mac) doesn't match with the one in the readme.txt of that link. If this is the case, then please re-download the mismatched chunk. (c) Your GIF reader engine failed to read/process the file correctly (especially for some gifs that have some unexpected headers), thus raised an exception. For this case, you can either edit the gif reader or handing the gif in a different format (e.g. set of frames).

By the way, your second point is correct. We didn't use ALL gifs (as noted, we've obtained the gif files from the TGIF paper), and some GIFs are used for more than one question.

I hope this answer helps you clarify your concerns, and again, thank you for your interest in our work.

Regards,