Why to normalize features?

PolarisRisingWar commented 3 years ago

I find the function normalize_attributes() in preprocessing.py, and I think it may be the function to normalize features.
I have read about why to normalize adjacency matrix, but I haven't learnt anything about normalizing features. So why to normalize the features? Is there any theoretical basis or explanation for this operation?

gasteigerjo commented 3 years ago

Normalizing the attributes can empirically improve performance. Note that there are 2 ways of normalizing: per node and per attribute. One or the other (or no normalization) can work best. This mainly depends on the type of underlying data.

The reason for normalizing per attribute is the same as anywhere in machine learning: It is often easier to train a model on standardized data. Otherwise there might be scale issues (one attribute might be orders of magnitude larger than another). Normalizing per node evens out the influence any one node has on the prediction. Otherwise one node might have very high attribute values, overwhelming the attributes of neighboring nodes.

PolarisRisingWar commented 3 years ago

Thank you very much for your reply! I've learned a lot!

gasteigerjo / ppnp

Why to normalize features? #14