Model improvements (for the future)

Giphart has made a number of points during the process and we should eventually take them into account. One of the open questions is how to make the text generation system aware of longer and longer contexts. I think a good place to start is Memory networks:

Memory Networks (2014)
End-to-end Memory Networks (2015)
Ask me anything (2016)
Neural Turing Machine
Hybrid Computing using a Neural network with dynamic external memory

More recently less ambitioned and easier to implement are variants of pointer networks, that would allow the model to pick up on recently introduced words (even if not in the vocabulary):

Pointer Networks
Pointer Sentinel Mixture Models
Improving NLM with a continuous cache

Especially the last one is kind of crazily effective for so little of an addition.

Of course, all these models are formulated for word-level systems...

UPDATE: (should work for both word and char-level models):

Dynamic Evaluation of Neural Sequence Models: The idea is to let the model weights be updated (through SGD) during evaluation, not sure how this would apply for generation, but it is worth thinking about it.

emanjavacas / weasimov

Model improvements (for the future) #40