Closed JamesYang007 closed 1 year ago
After much thought, I have concluded that the usual R^2 metric is not the correct metric for this problem. We are not performing a bona fide regression, so we are not interested in how much variance of y is unexplained with a given fit (there is no y!). But, we are still interested in how much our sequence of models is "improving" as it becomes saturated. To do this, we don't necessarily need the perfectly saturated model - we just need the most saturated model we have access to, which is the last lambda that we fit for. Concretely, we want to output (assuming there exists an X,y such that X^TX = A, X^Ty = r)
Delta(lmda) / Delta(lmda_last)
where Delta(lmda) := ||y||^2 - ||y - Xb(lmda)||^2. Note that Delta is computable only using A and r. Nice behavior is that the above quantity will always monotonically increase as a function of lmda from lmda_1,..., lmda_last.
It might be worth providing an option that computes the full R^2, not just the differences (current implementation)