Calculating metrics - Githubissues

jakubkarczewski commented 4 years ago

Hey,

I have a question concerning the way you calculate metrics:

Predictions of trained model can vary from close to 0 to over 60 for some experiments that I performed using Kaggle Credit Card Fraud Dataset.

My question is: why do use the predictions as if they were probabilities (even though they are not withing [0, 1] range) when calculating AUC-ROC and AUC-PR?

I know that both sklearn functions support this type of mixed input (binary and continuous vectors) but doesn't it give a false result?

Screenshot from 2020-04-23 13-16-29

This is a bit similar to what is happening in your implementation. Also, I think that the confidence threshold you mentioned in paper should be dependent on the value of margin used in deviation loss - not only on probit and normal distribution parameters.

Thanks for the reproducible paper :) Kuba

GuansongPang commented 4 years ago

Hi Kuba,

Thanks for you interest.

For anomaly detection, we often focus on the quality of the ranking w.r.t. the anomaly scores, and such a ranking quality is normally evaluated via AUC-ROC and AUC-PR with the raw anomaly scores. It also implies the range of the anomaly scores doesn't matter, too.

Yes, the binary class labels can be produced with a decision threshold, but it will result in many ties if we compare it with the anomaly ranking, and as a result, the performance measured via, e.g., AUC-ROC, will be different from the one using the raw anomaly scores. You may use F-score to evaluate the performance if the binary class labels are of your interest.

You are also right that the confidence threshold is dependent on the margin used in the deviation loss.

Cheers, Guansong

jakubkarczewski commented 4 years ago

Thank you for prompt answer! It's all clear now :+1:

Aml-Hassan-Abd-El-hamid commented 1 year ago

Hi there @GuansongPang @jakubkarczewski

I'm interested in getting predictions as binary class labels and I wanted to know how could I set a good decision threshold to get a good f1-score?

GuansongPang / deviation-network

Calculating metrics #2