d0ng1ee / logdeep

log anomaly detection toolkit including DeepLog
MIT License
387 stars 115 forks source link

In HDFS templates count is 28? #30

Open xichie opened 2 years ago

xichie commented 2 years ago

Thanks for your excellent project. But I have a little confused. I use drain as logparser, but the template count is 47. so I want to know what log parsing method you use to get the template.

tongxiao-cs commented 2 years ago

Maybe this can help you (https://github.com/logpai/logparser/blob/master/logs/HDFS/HDFS_templates.csv). But the number is 30 instead of 28.

xichie commented 2 years ago

Maybe this can help you (https://github.com/logpai/logparser/blob/master/logs/HDFS/HDFS_templates.csv). But the number is 30 instead of 28.

Thanks for your help,this template file is the groundtruth, which is not generated by any parsing method. Am I right?

tongxiao-cs commented 2 years ago

Yes, I think so.

And in issue #7, the owner mentioned that the benchmark result is based on "the ground truth" number of the templates(28). (https://github.com/donglee-afar/logdeep/issues/7#issuecomment-635044260)

ZhongLIFR commented 1 year ago

I think we should not simply use 28 as the number of templates because different parsing methods will give different number of templates (although the author mentioned that 28 is the number of ground truth templates). Instead, the specific number of templates should be calculated from the parsed log files. For HDFS, it only contains relatively a small number of templates (28, 30 or 46 in different papers). However, for dataset such as Spirit, Thunderbird, this number will be hundreds or even thousands (and the "ground truth" is generally not available).