Experiment with contextual embeddings based on Transformer architectures.

dhonza commented 3 years ago

Perform initial experiments with the contextual log line embeddings.

Our current embedding is based on aggregating (averaging) of per-token fastText embeddings. Contextual embeddings are expected to improve the performance of the downstream task similarly to NLP.

start with pre-trained BERT-like Transformer models (https://huggingface.co/, https://www.sbert.net/, https://simpletransformers.ai/), then:
- continue with unsupervised pretraining with objectives like masked language modeling (MLM) or next sentence prediction (NSP)
- finetune on labeled log data
analyze the embeddings (clustering, t-SNE visualizations...)
add to LAD benchmark suite and compare with other methods

savchart commented 3 years ago

raw logs -> distilbert -> f-1 score ~ 0% (sliding windows) drain logs -> distilbert -> f-1 score ~ 96% (sliding windows and preprocessing) drain logs -> distilbert -> f-1 score ~ 96% (without sliding windows and preprocessing)

savchart commented 3 years ago

MLM(only content) - eval_loss: 0.7880746566936903 Masked Language Modeling CLM(only content) - eval_loss: 3.7285281655768676e-07 Next word prediction(Causal Language Modeling)

LogAnalysisTeam / ml4logs

Experiment with contextual embeddings based on Transformer architectures. #5