Outliers not detected properly in new code

VincentWtrs commented 5 years ago

When running models with higher amounts of outliers (say: 10%) or higher. The algorithm fails to detect the outliers. The original version does a much better job, although not perfectly. I need to track the root cause of the issue.

Potential causes:

Reversal of the regularization path: before we started with a heavily regularized model to determine the best subset (?). I had the idea that it might hide outliers, but since the outliers are only in the informative subset, this might not be the case (under those assumptions).
...

VincentWtrs commented 5 years ago

I thought I had successfully tackled the issue by addressing cause 1. and reversing the order. However I reloaded the package with the unchanged original development branch and not the one introducing the fix. At first I tested with 10% outliers and mu_outlier = 10 and tests were actually successfull. Then testing with 20% outliers, they were not. TPR on outliers was very low, and FPR was very high.

Two new tangential issues:

When introducing above 12.5% outliers, there might be issues (since del = 0.125) by default. So let's test with 10%
When there are a lot of outliers, there might be a need for more separation between outliers and non-outliers (i.e. mu_outlier = 5 might not provide enough distinction between outliers and non-outliers).

Also I am testing with the IC option turned on (e.g. EBIC) but with the default hyperparameter grid construction (a big grid). This is clearly taking too long for useful testing, however it minimizes the chance of issues due to missing good lambda values.

VincentWtrs commented 5 years ago

Update: Re-training with outlier_mu = 10 and 10% outliers. The standard procedure using IC (EBIC) works well when not giving lambdas and alphas sequences. When specifying my own, it seems to fail.

Things to investigate further

Is my grid not fine enough or not covering a wide enough space?
Is my grid ordered differently when specifying it myself

VincentWtrs / enetLTS

Outliers not detected properly in new code #2