CU-Boulder-APPM-X720-Fall-2022 / biweekly-report-3-gnorman7

biweekly-report-3-gnorman7 created by GitHub Classroom
0 stars 0 forks source link

Feedback #4

Open yyexela opened 2 years ago

yyexela commented 2 years ago

Feedback on Grant Norman's Biweekly Report 3

Alexey Yermakov

Superpowers

Areas of Improvement

Questions

Price

95/100

gnorman7 commented 2 years ago

Thank you for the detailed review Alexey! I appreciate the time you took to carefully go through the report and to write the review.

In the section stating "The objective above is convex" (in beta), how was this determined? The Wikipedia article linked doesn't mention the word "convex". This could be explained better (I think I understand why intuitively but would like reassurance).

For reference, this is the objective that I said was convex:

$$ \arg \min_\beta || {y - \mathbf{X} \beta} ||_2 ^2 $$

This is a quadratic expression in $\beta$. It's a pretty common thing to come across in some other areas, so I kind of forget that it's a little bit of a jump to know that this is convex. I'll add a brief explanation here (maybe I'll push to the report after it's graded):

Tl; dr Think of the 1D case. This is a quadratic function with a non-negative coefficient on the quadratic term. The idea generalizes to higher dimensions with the idea of positive semi-definiteness.

We can expand the norm into an inner product as the following: $(y - \mathbf{X} \beta) ^T (y - \mathbf{X} \beta) = y^T y -y^T(\mathbf{X} \beta) - (\mathbf{X} \beta)^T y + \beta^T \mathbf{X}^T \mathbf{X} \beta $. In this expression we are talking about the dependence on $\beta$, just treat $y$ and $\mathbf{X}$ as constants. In other words, ignore the first three terms. For the 1D case - a parabola in $\beta$ :

$$ y^2 - 2Xy \beta + X^2 \beta ^2 $$

the $y^2$ and $-2X y \beta$ terms just control the location of the parabola's vertex. So we can only focus on $\beta^T \mathbf{X}^T \mathbf{X} \beta$ term. For the 1D case, in order for the function to be convex, we need the constant in front of $\beta^2$ to be non-negative. For the general case, this transfers to needing $\mathbf{X}^T \mathbf{X}$ to be positive semi-definite. Just like in the 1D case where we expect $X^2$ to be non-negative (if $X$ is real), this matrix is positive semi-definite.

You seem very knowledgeable in scientific computing/ numerical analysis, what's your background in it? I'm currently taking undergrad numerical analysis and a lot of what you talk about it familiar but you also seem to have a deeper understanding of the topics.

Thank you! I'm a graduate student doing research in a numerical area: somewhat related to PDEs. So I've been working on related stuff for a little over a year now. In terms of courses, I've gained probably 50% of all my numerical analysis knowledge from Prof. Stephen Becker (undergrad numerical analysis 1 and theoretical machine learning). Also, finite elements in the aerospace department helped a lot with seeing some of the numerical analysis ideas again (it's basically an APPM class).

Definitely trying hard in Numerical Analysis will pay big dividends. I think a strong general math background will help too. I've been surprised by how much areas such as optimization and numerically solving PDEs rely on strong mathematical foundations.