Dataset download issue - Githubissues

bhattg commented 1 year ago

Hi,

I am trying to reproduce the 1k and 4k numbers for the ImageReward function accuracy, as mentioned in the paper. To do so, I downloaded the data, and modified it slightly so that it could be loaded using the script make_dataset.py. However, there are some file IDs in the training set, that have null images, that is 0K file size.

Following is the list of the IDs.

005050-0024
005389-0008
005795-0038
006272-0041
006756-0071
005165-0028
005332-0172
005356-0019
006011-0030
006167-0087
006758-0099
005179-0097
005444-0063
005434-0068
005459-0003
005344-0055
006174-0048
006190-0114
006214-0021
006787-0015
006857-0073
006830-0003

xujz18 commented 1 year ago

Thanks for pointing this out, we do have a small number of image files in our dataset that don't exist, we'll be fixing this in the next version, you can skip these invalid images for now.

muse1998 commented 11 months ago

Hi,

I am trying to reproduce the 1k and 4k numbers for the ImageReward function accuracy, as mentioned in the paper. To do so, I downloaded the data, and modified it slightly so that it could be loaded using the script make_dataset.py. However, there are some file IDs in the training set, that have null images, that is 0K file size.

Following is the list of the IDs.
005050-0024
005389-0008
005795-0038
006272-0041
006756-0071
005165-0028
005332-0172
005356-0019
006011-0030
006167-0087
006758-0099
005179-0097
005444-0063
005434-0068
005459-0003
005344-0055
006174-0048
006190-0114
006214-0021
006787-0015
006857-0073
006830-0003

Hello, I am also working on reproducing the training results, but I found the 'train.json' file in huggingface seems cannot be directly used for make_dataset.py. Could you share the processed train.json file? many thanks!

THUDM / ImageReward

Dataset download issue #53