The experimental data of the paper cannot be reproduced

HelenGuohx / logbert

log anomaly detection via BERT

MIT License

240 stars 102 forks source link

The experimental data of the paper cannot be reproduced #24

Open chinahappyking opened 2 years ago

chinahappyking commented 2 years ago

hi, guo I have tried many times. The following results are always the same, which is far from the results in the paper. Is there any difference between the results in the paper and the code?

Can you add a wechat private chat?

dataset: hdfs git branch: main ==================== logbert ==================== best threshold: 0, best threshold ratio: 0.0 TP: 7602, TN: 549880, FP: 3488, FN: 3045 Precision: 68.55%, Recall: 71.40%, F1-measure: 69.95%

HelenGuohx commented 2 years ago

Can you share your email?

Thanks

hniu1 commented 2 years ago

I have the same issue. Did you end up can reproduce the results?

chinahappyking commented 2 years ago

Can you share your email?

Thanks

haydenzhu@163.com

hniu1 commented 2 years ago

Thanks for reaching out!! My email is @.***

Best, Nick

On Sat, Oct 22, 2022 at 10:03 AM chinahappyking @.***> wrote:

Can you share your email?

Thanks

@.***

— Reply to this email directly, view it on GitHub https://github.com/HelenGuohx/logbert/issues/24#issuecomment-1287802717, or unsubscribe https://github.com/notifications/unsubscribe-auth/AHVRJ64AIAB6RFL3CSG6RJTWEPX4VANCNFSM5QBCFMLQ . You are receiving this because you commented.Message ID: @.***>

jplasser commented 1 year ago

I just tried and have the same results:

best threshold: 0, best threshold ratio: 0.0 TP: 7643, TN: 549806, FP: 3562, FN: 3004 Precision: 68.21%, Recall: 71.79%, F1-measure: 69.95%

I haven't looked (deeply) into the code so far, but is the training data really limited to n=4855, as the code in line 122 in file data_process.py seems to indicate?

generate_train_test(log_sequence_file, n=4855)

How can I train for better results?

jplasser commented 1 year ago

I removed n=4855 from the described code line in the previous comment and now I have a lot more training data available. I‘ll post about the results again.

jplasser commented 1 year ago

Here are my results after applying the above changes:

best threshold: 0, best threshold ratio: 0.0 TP: 6996, TN: 390662, FP: 95, FN: 3651 Precision: 98.66%, Recall: 65.71%, F1-measure: 78.88%

Recall and F1 are still lower than in the paper, which were P=87.02, R=78.10, and F1=82.32 Caveat> I stopped training after 60 epochs, so this could be a reason for the underperforming values.

jplasser commented 1 year ago

One more, after finishing training with a batch size of 512 with HDFS, val loss=0.183, train loss=0.178, 135 epochs, takes about 35 minutes on a RTX 3090.

best threshold: 0, best threshold ratio: 0.0 TP: 7583, TN: 390484, FP: 273, FN: 3064 Precision: 96.52%, Recall: 71.22%, F1-measure: 81.97%

19982084685 commented 1 year ago

Here is my result, training with a batch size of 512 with HDFS, val loss=0.537, train loss=0.451, 87 epochs, takes about 39 minutes on a RTX 3090.

best threshold: 0, best threshold ratio: 0.0 TP: 7908, TN: 389836, FP: 921, FN: 2739 Precision: 89.57%, Recall: 74.27%, F1-measure: 81.21% elapsed_time: 561.5744488239288