IDSIA / brainstorm

Fast, flexible and fun neural networks.
Other
1.3k stars 152 forks source link

Retaining context across forward passes #57

Open flukeskywalker opened 9 years ago

flukeskywalker commented 9 years ago

We should have a context_reset_rate parameter (subject to renaming) in the trainer which is set by the train function. Using this, the context should be reset (cleared) if current_update_nr % context_reset_rate == 0, otherwise it should be retained.

flukeskywalker commented 9 years ago

Additionally, this information also needs to be provided to anyone else that calls forward_passes on the network, including the evaluate tool and the hooks that use it.

Qwlouse commented 9 years ago

What would be the goal of this? It seems like a hack that only works in certain cases. I think if we tackle the issue of retaining context we should do it properly, such that you can specify exactly when it should be reset.

Additionally, the trainer never actually calls the forward pass of the network. That is done by the steppers and the hooks. So the trainer would only distribute the information.

Maybe it would be better to have the network be responsible. We could have a special input like the mask (say reset = ('B', 1)) and the network only resets the context if that contains at least one 1 in the current minibatch.

The issue might be more complicated if we allow steppers that call the forward pass multiple times, and it also obviously doesn't play well together with shuffling.

flukeskywalker commented 9 years ago

This feature is a pretty basic requirement for language (or any kind of data) modeling, so we need to have this feature ASAP.

Having a special input for such a mundane case is a bit annoying (but I wouldn't rule it out).

It's true that the trainer (or evaluate()) would simply pass around this information. I thought of storing this in the network, but having the network essentially count how many forward passes have been done on it also seems kinda hacky.

Qwlouse commented 9 years ago

How about putting the network in a special keep-context mode, and then having a hook call clear context on it when needed? That would work for training but not so well for evaluation (possibly inside another hook). Hmm now that I think about it: maybe not. So back to putting it alongside the data...

IMHO the default behaviour should remain to always discard context though.

flukeskywalker commented 9 years ago

This is why I thought (from the user's perspective) that giving this info to the train and evaluate (actually, the hooks that call evaluate) functions makes sense. Internally it is just passed along, but it appeared to be clearer with few chances of confusion and no extra data required.

I agree about the default behavior.