LogAnalysisTeam / ml4logs

Machine Learning methods for log file processing
MIT License
0 stars 3 forks source link

Experiment with contextual embeddings based on Transformer architectures. #5

Open dhonza opened 3 years ago

dhonza commented 3 years ago

Perform initial experiments with the contextual log line embeddings.

Our current embedding is based on aggregating (averaging) of per-token fastText embeddings. Contextual embeddings are expected to improve the performance of the downstream task similarly to NLP.

savchart commented 3 years ago

raw logs -> distilbert -> f-1 score ~ ​​0% (sliding windows) drain logs -> distilbert -> f-1 score ~ ​​96% (sliding windows and preprocessing) drain logs -> distilbert -> f-1 score ~ ​​96% (without sliding windows and preprocessing)

savchart commented 3 years ago

MLM(only content) - eval_loss: 0.7880746566936903 Masked Language Modeling CLM(only content) - eval_loss: 3.7285281655768676e-07 Next word prediction(Causal Language Modeling)