eriklindernoren / ML-From-Scratch

Machine Learning From Scratch. Bare bones NumPy implementations of machine learning models and algorithms with a focus on accessibility. Aims to cover everything from linear regression to deep learning.
MIT License
23.91k stars 4.58k forks source link

Linear Regression Regularization bias #105

Open kotlyar-shapirov opened 1 year ago

kotlyar-shapirov commented 1 year ago

Minor suggestion: Using all the weights (including bias) in regularization might end up in constraining aformentioned bias for non-normilized training data. e.g:

  class l1_regularization():
      """ Regularization for Lasso Regression """    
      def __call__(self, w):
          return self.alpha * np.linalg.norm(w) # this will constrain the bias too

It's extremely easy to fix, since you add bias as the zero'th column in data:

    def fit(self, X, y):
        # Insert constant ones for bias weights
        X = np.insert(X, 0, 1, axis=1)           
        self.training_errors = []
        self.initialize_weights(n_features=X.shape[1])

The new regularization should exclude zero'th weight from norms (and it's less than one line fix :)

  class l1_regularization():
      """ Regularization for Lasso Regression """
      def __init__(self, alpha):
          self.alpha = alpha

      def __call__(self, w):
          return self.alpha * np.linalg.norm(w[1:]) # here

      def grad(self, w):
          return self.alpha * np.sign(w[1:]) # and here

Same for the l2 and l1_l2

adityach007 commented 10 months ago

Your observation about the potential impact of including bias in L1 regularization for Lasso regression is correct. The bias term (typically the zeroth weight) should sometimes be treated differently to avoid inadvertently penalizing it excessively when applying L1 regularization.

Your suggested modification, excluding the bias term from the norm calculation in L1 regularization, is a sound approach to address this issue. It effectively adjusts the regularization calculation to avoid penalizing the bias term along with other weights.

class L1Regularization():

def __init__(self, alpha):
    self.alpha = alpha

def __call__(self, w):
    return self.alpha * np.linalg.norm(w[1:], ord=1)  

def grad(self, w):
    grad = np.zeros_like(w)
    grad[1:] = self.alpha * np.sign(w[1:]) 
    return grad