Closed evanmiltenburg closed 8 years ago
Some other additions on my branch that I've found really useful are:
' '.join([x for x
in itertools.takewhile(
lambda n: n != "<E>",
complete_sentences[i])])
yield {'text':arrays[0], 'img': arrays[1],
'output': targets, 'ident': ident}
The first problem has already been fixed in the experimental_datagen branch although perhaps I had not pushed the code to the repository. Sorry for that!
I like the idea of yielding extra key:value pairs in the data generator. But how does the model behave when fit_generator() and predict() receive an input that is never used in the model?
Not sure, and I don't have time at the moment to test this. I'd hope that it just ignores unused keys (that would be the best way to deal with cases like this anyway, just use the keys that it's specified to use and don't make any additional assumptions). It works very well for generating descriptions, though! Using the IDs makes comparing the generated sentences easier and more reliable (it's more difficult to mix up the images).
@evanmiltenburg logically, it sounds like model.predict()
can safely ignore the {... , 'ident': ident}
pair in your dictionary because it doesn't seem to be complaining about it.
Perhaps we could look at including this type of additional information in a batch dictionary but mark it as experimental to signify that it might break at any time.
Fixed in 38be3ed05cd30497db5b53f20e38079f848ec3ef
This kept me up yesterday: I was getting 507 sentences for the test set, rather than 1000. Looking at
generate.py
I found the problem was this part:I fixed it by changing line 107 from
for data in generator:
to this:and lines 242-245 (the 'hacky way') to:
This solution should now work for any part of the dataset you want to test on :)