HelenGuohx / logbert

log anomaly detection via BERT
MIT License
240 stars 102 forks source link

About BGL parsing result #1

Closed ericzhou571 closed 3 years ago

ericzhou571 commented 3 years ago

Hi, I'm also working on BGL dataset. Your new paper about outlier detection is interesting. I just read you it and find that you also use Drain to do log parsing.

Could you tell me how you deal with the template problem? Raw BGL dataset after Drain log parsing (with re expression used in Drain demo) will have 1000+ templates. But the ground truth is around 400 as your paper mentioned.
Maybe you use some specific re expression before parsing?

Thanks a lot Wenrui

ComplicatedPhenomenon commented 3 years ago

@ericzhou571 Using multiple regex expressions can indeed decrease the template number within 400. To handle the BGL dataset, I modify the part for preprocessing data. Yet this doesn't help much to increase the F1 Score.

HelenGuohx commented 3 years ago

Yeah, check the shell scripts for BGL dataset.