dawnvince / EasyTSAD

A framework for easy running and evaluating your TSAD algorithm.
GNU General Public License v3.0
76 stars 17 forks source link

How is point-wise F1 score being calculated? #6

Open Yr-Nemsis opened 3 weeks ago

Yr-Nemsis commented 3 weeks ago

Dear author, great work on creating such a comprehensive benchmark for time series anomaly detection! I was wondering if the mechanism of calculating the point-wise F1 score has been full provided as I am trying to replicate the results presented in the online leaderboard. From the Github Website, I see that the PointF1 class is defined as following:

class PointF1(EvalInterface):
    """
    Using Traditional F1 score to evaluate the models.
    """
    def __init__(self) -> None:
        super().__init__()
        self.name = "point-wise f1"

    def calc(self, scores, labels, margins) -> type[MetricInterface]:
        '''
        Returns:
         A F1class (Evaluations.Metrics.F1class), including:\n
            best_f1: the value of best f1 score;\n
            precision: corresponding precision value;\n
            recall: corresponding recall value;
        '''
        print("scores: ", scores)
        print("labels: ", labels)
        prec = precision_score(labels, scores)
        rec = recall_score(labels, scores)
        f1 = f1_score(labels, scores)

        return F1class(
            name=self.name,
            p=prec,
            r=rec,
            f1=f1
        )

However, a threshold has not been defined for finding the anomaly points based on the scores, which makes it impossible to calculate the precision and recall of the results outputted by models. Is it because I haven't found the right function for calculating point-wise F1 score or is it still under development? Thanks!

dawnvince commented 2 weeks ago

That's quite a problem. We are sorry for employing the wrong function, and we have fixed the bug. Currently, the point-wise F1 will find the highest F1 score by trying all possible thresholds. Thanks for your comments!

Also, It is worth noting that the results on the leaderboard are based on PointF1PA instead of PointF1, because PointF1 seems too strict and impractical for most real-time anomaly detection tasks. Only if your scene has extremely high requirements for precise detection, PointF1 might be a good choice.

Feel free to contact us if there is any other issue😉