The bias nodes exist only in pre-output layers, but for example, the weight connected the the hidden bias neuron is called bias_o. This confused me a lot, so I think it is easier to understand bias_o as bias_h and bias_h as bias_i. This way I think it's easier to read the code that updates the biases, since the gradient from the output layer, for example, is mapped to the bias in the previous layer, as the process of backpropagation might suggest by its name.
The bias nodes exist only in pre-output layers, but for example, the weight connected the the hidden bias neuron is called
bias_o
. This confused me a lot, so I think it is easier to understandbias_o
asbias_h
andbias_h
asbias_i
. This way I think it's easier to read the code that updates the biases, since the gradient from the output layer, for example, is mapped to the bias in the previous layer, as the process of backpropagation might suggest by its name.