Normalization 101 - Githubissues

snowfrogdev commented 7 years ago

Really liked the Normalization 101 article in the Wiki. The normalization described in the article is a feature-scaling kind of normalization where all values are brought into the range [0,1]. But what about standardization (standard-score normalization)? Can we use that instead with Synaptic?

I'm planning to train a neural network to predict a house's value when given inputs like: # of bedrooms, # of bathrooms, square footage living space, square footage of the land plot, year built, and a bunch of other values. I want to feed the network some data for real estate transactions from the past 10 years (2006-2016) and the year of the transactions will be one of the inputs. But what happens, if I normalize the years into the range [0,1], when I'm done training and want to actually use the network and I start feeding it values outside the upper bound. For instance, lets say it is now 2018 and I want to know how much a certain house is worth; 2018 would normalize to 1.2, would that be a problem? Wouldn't it be better to standardize, rather than normalize, this value in a case like this?

ghost commented 7 years ago

You can prepare your data for example to convert changes into % instead of actual changed values. But in the end everything needs to be in the 0..1 range. In the normalization define a minimum and maximum value to have room for future inputs without the need to retrain your network if you have large data. I recommend to avoid the both end values and normalize within 0.1 and 0.9 which can result in better predictions.

cazala commented 7 years ago

You can feed larger values, even negative ones, the network will squash them in the end using their activation function, it's just that it works better when all these values are normalized to values that makes sense to the type of activation function you are using. Look at this function for instance:

Any value higher that 4 is pretty much always 1. If your inputs are numbers like, 2016, each of them is multiplied by their corresponding weight (a value between 1 and -1 usually) and then add up to the neurons state. If your state ends up having a value in the order of the hundreds, or thousands, once you pass it through that squashing function the activation will always be pretty much 1, the network won't produce a very different value if the input was 2006 or 2016. So it won't do a good job doing predictions, probably. You need to play a lil bit with your values, and find the best fit taking into account your activation function(s). But most of the times the best results come from values normalized between [0, 1], [-1, 1], [-2, 2]... hope that helps.

cazala / synaptic

Normalization 101 #177