[ ] The recommended preprocessing is to center the data to have mean of zero, and normalize its scale to [-1, 1] along each feature
Initialize the weights by drawing them from a gaussian distribution with standard deviation of 2/n‾‾‾√, where n is the number of inputs to the neuron. E.g. in numpy: w = np.random.randn(n) * sqrt(2.0/n).
[ ] Use L2 regularization and dropout (the inverted version)
https://cs231n.github.io/neural-networks-2/
In summary: