Description of the data preprocessing techniques we plan to use in the project:
1) Normalization of system paths (~home), /opt/bin, /bin/ etc - heuristics
2) Lowercase, 's, timestamps, PII removal (emails, passwords) (library?)
3) Optional translation
4) Stopwords (do we need them?)
Description of the data preprocessing techniques we plan to use in the project: 1) Normalization of system paths (~home), /opt/bin, /bin/ etc - heuristics 2) Lowercase, 's, timestamps, PII removal (emails, passwords) (library?) 3) Optional translation 4) Stopwords (do we need them?)