Starsa / Mentorship

Projects for Mentorship
0 stars 0 forks source link

Statistical and Machine Learning concepts #28

Closed Starsa closed 3 years ago

Starsa commented 3 years ago

Memorize statistical and machine learning concepts with one sentence answer

Starsa commented 3 years ago

p-value:

the probability of getting results at least as extreme as what we observed, given the null hypotheses is true.

Starsa commented 3 years ago

R2

How much variance in the target vector is explained by the features. Determines goodness of fit.

Starsa commented 3 years ago

Adjusted R2

Modified version of R2, it imposes a penalty for too many features

Starsa commented 3 years ago

AUC

AUC (Area under the ROC curve) is an aggregate measure of performance across all possible classification thresholds.

Starsa commented 3 years ago

Precision

Measure of accuracy (True Positive/ TruePositives + False Positives)

Recall

True positive rate (True Positives/ TruePositives + False Negatives)

if we wanted to find all positives, absolutely, maximize the recall

Starsa commented 3 years ago

Bias

Error- Expected error created by using a model to approximate a real-world function

Variance

Noise- The error from sensitivity to small fluctuations in the training set

Bias Variance Trade-off

A simple model has high bias and low variance and a complex model has high variance and low bias. The tradeoff is finding a good balance without overfitting or underfitting the data.

CLuiz commented 3 years ago

These are all great. Hopefully you won't get a grumpy statistician asking about p values

CLuiz commented 3 years ago

AUC

AUC (Area under the ROC curve) is an aggregate measure of performance across all possible classification thresholds.

Can you work out why I don't like AUC for model evaluation from this definition?

Starsa commented 3 years ago

Precision

Measure of accuracy (True Positive/ TruePositives + False Positives)

Recall

True positive rate (True Positives/ TruePositives + False Negatives)

if we wanted to find all positives, absolutely, maximize the recall

According to Cassie Kozyrkov: Precision: "Don't waste my time. Missed opportunities are okay" Recall: "Collect 'em all. Duds are okay"

Starsa commented 3 years ago

Accuracy

"All mistakes are equally bad"

F-measure

"I can't choose between precision and recall"