Closed arushk1 closed 8 years ago
So, the qa
dict is not able to find a key which should have been present.
Try this script from the same folder, and let me know what output you get:
(Note that I have not tested it, so correct any minor error that you might encounter)
import operator
import argparse
import sys
import os
import progressbar
from spacy.en import English
if os.path.isdir('../3rdParty/VQA/PythonHelperTools/vqaTools/'):
sys.path.insert(0, '../3rdParty/VQA/PythonHelperTools/vqaTools/')
else:
print 'Please download the VQA tools and out them in the 3rdParty folder'
from vqa import VQA
def getModalAnswer(answers):
candidates = {}
for i in xrange(10):
candidates[answers[i]['answer']] = 1
for i in xrange(10):
candidates[answers[i]['answer']] += 1
return max(candidates.iteritems(), key=operator.itemgetter(1))[0]
def getAllAnswer(answers):
answer_list = []
for i in xrange(10):
answer_list.append(answers[i]['answer'])
return ';'.join(answer_list)
def main():
parser = argparse.ArgumentParser()
parser.add_argument('-isTrain', type=int, default=1)
args = parser.parse_args()
nlp = English() #used for conting number of tokens
if args.isTrain == 1:
annFile = '../data/Annotations_Train_mscoco/mscoco_train2014_annotations.json'
quesFile = '../data/Questions_Train_mscoco/OpenEnded_mscoco_train2014_questions.json'
questions_file = open('../data/preprocessed/questions_train2014.txt', 'w')
questions_lengths_file = open('../data/preprocessed/questions_lengths_train2014.txt', 'w')
answers_file = open('../data/preprocessed/answers_train2014.txt', 'w')
coco_image_id = open('../data/preprocessed/images_train2014.txt', 'w')
trainval = 'training data'
else:
annFile = '../data/Annotations_Val_mscoco/mscoco_val2014_annotations.json'
quesFile = '../data/Questions_Val_mscoco/OpenEnded_mscoco_val2014_questions.json'
questions_file = open('../data/preprocessed/questions_val2014.txt', 'w')
questions_lengths_file = open('../data/preprocessed/questions_lengths_val2014.txt', 'w')
answers_file = open('../data/preprocessed/answers_val2014.txt', 'w')
coco_image_id = open('../data/preprocessed/images_val2014.txt', 'w')
trainval = 'validation data'
#initialize VQA api for QA annotations
vqa=VQA(annFile, quesFile)
questions = vqa.questions
ques = questions['questions']
qa = vqa.qa
try:
print qa[1]['answers']
except:
print qa[1+248349]['answers']
if __name__ == "__main__":
main()
Traceback (most recent call last):
File "dumpText.py", line 68, in
How many answers are there in the dataset?
Ten answers for every question, and about 240K questions (in the train set).
Try printing the key values for the qa
dict:
for key, value in qa.iteritems() :
print key, value
Also, have you downloaded the json files manually from the VQA website (or have you used my script)? Try downloading those files manually and putting them into your data folder, and see what you get. There might be an issue with my dataset download script, since I have not tested it on my system yet.
P.S. I re-cloned the repo to my system, and had no issues with the dumpText.py script.
There was problem of the downloaded files not being in their respective folders. So just put them in the folders. They are the same as those from the website. So I don't think that should be a problem
Also, there are 248,349 questions according to the website, so it should work with that index. Is there a reason you've added 1 + 248,349?
Checked the length. qa is 248,349 questions long. Changed the index, still gives me the same error
For the train set, the questions go from 1 to 248349. For the val set, they go from 248349+1 to something. I'm downloading the dataset and the VQAtools code once again. There's a chance that they might have changed something in their VQAtools code, or the json files since I last worked with it.
Val set is a different json file isn't it?
Yes, it's a different file, but I just wanted to make sure that they hadn't been mixed up (due to accidental renaming).
BTW, I have reproduced the error on my system (it's caused due to the new json files being somewhat different on the VQA website). I will post a fix soon.
Also, once I get the train the model, where will the file be located? And how can I get results from my own images using that?
Let me know what your experience is after the patch. I've changed the directory names such that they work with the download script.
P.S. I would request you to ask other questions in a separate issue.
loading VQA annotations and questions into memory... 0:00:16.625574 creating index... index created! Dumping questions,answers, imageIDs, and questions lenghts to text files... Traceback (most recent call last): | File "dumpText.py", line 82, in
main()
File "dumpText.py", line 74, in main
answers_file.write(getModalAnswer(qa[i]['answers']).encode('utf8'))
KeyError: 1