The performance of baselines are different from the paper

Thank you for providing the interesting work!

I ran the baseline codes for P12; however, the results show higher scores than the proposed method reported in the original paper.

The results I reproduced for mTAND were 85.1 AUROC and 52.4 AUPRC on the P12 dataset. In the paper, the performance of the proposed ViTST on P12 was reported as 85.1 AUROC and 51.1 AUPRC.

I also reproduced the transformer-mean but the AUROC and AUPRC scores were 85.3 and 51.7.

The Transformer-mean and mTAND AUPRC was higher than that of ViTST.

Is there anything I missed when reproducing the results?

Thank you!

[Reproduced mTAND results]

스크린샷 2024-03-07 오후 2 04 30

[Reproduced Transformer-mean results]

스크린샷 2024-03-07 오후 4 34 59

[The paper results]

[Summarized results (P12)]

Metrics	Transformer (Paper)	Transformer(Reproduced)
AUROC	83.3 ± 0.7	85.1 ± 1.1
AUPRC	47.9 ± 3.6	51.2 ± 2.4

Metrics	Transformer-mean (Paper)	Transformer-mean (Reproduced)
AUROC	82.6 ± 2.0	85.3 ± 0.8
AUPRC	46.3 ± 4.0	51.7 ± 3.1

Metrics	mTAND (Paper)	mTAND (Reproduced)
AUROC	84.2 ± 0.8	85.1 ± 1.0
AUPRC	48.2 ± 3.4	52.4 ± 3.0

Metrics	ViTST (Paper)	ViTST (Reproduced)
AUROC	85.1 ± 0.8	85.2 ± 1.0
AUPRC	51.1 ± 4.1	51.1 ± 3.7

[Summarized results (P19)]

Metrics	Transformer (Paper)	Transformer(Reproduced)
AUROC	80.7 ± 3.8	81.0 ± 3.4
AUPRC	42.7 ± 7.7	42.8 ± 8.4

Metrics	Transformer-mean (Paper)	Transformer-mean (Reproduced)
AUROC	83.7 ± 1.8	84.1 ± 1.5
AUPRC	45.8 ± 3.2	44.9 ± 2.1

Metrics	mTAND (Paper)	mTAND (Reproduced)
AUROC	84.4 ± 1.3	79.1 ± 1.3
AUPRC	50.6 ± 2.0	27.5 ± 2.5

Metrics	DGM-O (Paper)	DGM-O (Reproduced)
AUROC	86.7 ± 3.4	88.1 ± 2.5
AUPRC	44.7 ± 11.7	53.0 ± 5.1

Metrics	Raindrop (Paper)	Raindrop (Reproduced)
AUROC	87.0 ± 2.3	86.3 ± 2.7
AUPRC	51.8 ± 5.5	50.2 ± 6.6

Leezekun / ViTST