A mistake in train_questions.hdf5

junliangliu commented 6 years ago

In "collate_fn" function of dataloader.py, the input args: data.question's shape shoud be (S, 300), but when i load train_questions.hdf5 the shape is (699989,). data.question in train_questions.hdf5 is a int list, not a 2-dim tensor.

is it a bug ?

bhpfelix commented 6 years ago

Hi, Sorry for the confusion, I haven't got a chance to maintain the project lately, and some of the documentation may be outdated. 699989 should be the number of the training questions (size of the training set). After loading the question file with f = h5py.File('train_questions.hdf5', 'r') and questions = f['questions'], questions[str(idx)][:] gives the question at index idx. Each question is represented by an 1-dim int array, with each word represented by its index in tokens.json. This 1-dim int array is later used by the InputUnit to look up the work embedding as follows: embeds = self.char_embeds(questions).

junliangliu commented 6 years ago

Hi, I have fully understand what the code does now，i enjoy the work! : ) I want to plot the attention on every reason step as shown in the paper, can you give me some advice ?

If it is convenient, can i get your source code (plot image and question attention) ? I am a newbie for attention display. : ) Great thanks !

bhpfelix / Compositional-Attention-Networks-for-Machine-Reasoning-PyTorch

A mistake in train_questions.hdf5 #3