cazala / synaptic

architecture-free neural network library for node.js and the browser
http://caza.la/synaptic
Other
6.91k stars 666 forks source link

Min mean square error back propagation for output neurons #33

Open Sleepwalking opened 9 years ago

Sleepwalking commented 9 years ago

On https://github.com/cazala/synaptic/blob/master/src/neuron.js#L119, error responsibility for an output layer neuron is calculated as cross entropy derivative, which may not be optimal for regression tasks where square error is expected to be minimized.

 // output neurons get their error from the enviroment
if (isOutput)
  this.error.responsibility = this.error.projected = target - this.activation; // Eq. 10

Under MMSE criterion we should change the codes to

 // output neurons get their error from the enviroment
if (isOutput)
  this.error.responsibility = this.error.projected = (target - this.activation) * this.derivative; // Eq. 10

However I did some tests and found that the difference between MMSE and MCE (min cross entropy) training is barely observable. I set up this issue and see if someone find MMSE better than MCE or vice versa.

cazala commented 9 years ago

At some point we changed the way we computed the error responsibility only for the output layer (link) and that increased the performance for the DSR dramatically (before that it used to take over 100k iterations to solve that task, after the change we cut it down to 15k). Also the XOR task (which uses MSE, not cross-entropy as its cost function) improved, from 3k-5k iterations to 100-200. Maybe we should do some tests to determine if NOT multiplying by the derivate is only better when using cross-entropy (Eq. 10 from the paper uses CE and states that the error responsibility for the output layer is computed only as target - activation), but as far as I remember it performed better on every scenario.

Sleepwalking commented 9 years ago

Interesting. I can replicate your observation that MSE-based DSR takes much more iterations than CE-based. But MSE and CE perform similar on the timing task. Let's leave this issue here before I further optimize synaptic so we can test this on some real world tasks.