Consider explaining LDA without "optimization"

It was suggested on /r/statistics that thinking of LDA as an optimization problem as described in this article isn't how statisticians currently think about LDA. This would explain why it is not actually solved as an optimization problem in practice (which is something I skipped over in the article, see https://github.com/OmarShehata/lda-explorable/issues/3):

While I applaud the use of interactivity, I don't actually think this is the best way to go about thinking about LDA.

Firstly, you're talking about Fisher's original formulation of LDA (wiki). Nowadays we usually use the generative model version of LDA, and I think that is actually very intuitive.

Essentially, you assume that your data is generated from normal distributions, with a common covariance structure (if it's not the same, then you get QDA). That is, each class has its own normal distribution. Then, it's a little work to show that if you assume those distributions, then the (Bayes) optimal way to classify new data points correspond to linear separations (intuitively, you're just checking which density is higher, that is your classification).

I think reformulating this explanation would essentially be a different article, but it could still re-use most of the code and visualization here. Happy to support anyone who wants to explore this path.

OmarShehata / lda-explorable

Consider explaining LDA without "optimization" #4