By propagating values from the first layer (the input layer) through all the mathematical functions represented by each node, the network outputs a value. This process is called a forward pass.
Gradient Descent
gradient: 相当于是我们应该往哪里去push ball
how much force should be applied to the push. -> This is known as learning rate. -> 它决定了网络学的有多快。
定义learning rate有点像在猜数字游戏,但是一般来说在0.1 to 0.0001 会work well。(The range 0.001 to 0.0001 is popular, as 0.1 and 0.01 are sometimes too large.)
MiniFlow
Graphs
Forward Propagation
Gradient Descent