I could not immediately find the idea behind the Gini impurity index on the internet. The following derivation helped me understand the intuition a little bit better:
The idea is that this captures "how often a randomly selected element is labeled incorrectly if the label is chosen randomly according to the actual distribution (in a leaf)".
The definition of information gain, it is unclear to me what X_i is exactly. I would have expected Gain(X, i) and |X| in the denominator of the fraction. Would that make sense? Furthermore, am I correct that this l=1 to L sum loops over what some call the levels of this feature?