Open stephen-hayne opened 2 years ago
Hi, Currently, we adopt LogRobust to generate the embedding file. For now, it isn't included in this repository. We will try to start updating this part in the next week.
I have read "Robust Log-Based Anomaly Detection on Unstable Log Data" and "Log-based Anomaly Detection Without Log Parsing" with interest (as well as several of the others in the citations).
Will LogRobust be put on github? Or just the data you generated?
We will add the code to generate embeddings in this repository, not only the generated data.
How can we get this HDFS.log_structured.csv?
@vanhoanglepsa is the code updated to generate embeddings for generic log data ? @stephen-hayne were you able to resolve this issue ? I am having same issue.
@souravs17031999 No, this issue is not resolved.
@vanhoanglepsa Can you please help us to reproduce your work?
hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!!
Yes - I haven't been able to find the file you mentioned either...
I have succeeded generated the embedding.json by this code. Hope it helps! https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py
-- Dr. Stephen C. Hayne, Professor Emeritus (CSU) Cyber Security and Information Systems Consultant I love to fly formation! Nanchang - 443LM "http://selfsynchronize.com/hayne/plane/"
On Thu, Apr 6, 2023 at 7:41 PM pupuu555 @.***> wrote:
hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!!
— Reply to this email directly, view it on GitHub https://github.com/LogIntelligence/LogADEmpirical/issues/7#issuecomment-1499829481, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKIS2PHXIS4WRBZDJVNEWLW75WCZANCNFSM5VZAXOPQ . You are receiving this because you were mentioned.Message ID: @.***>
Hi, the code following is my used to generate embedding.json. Hope it helps!
from logadempirical.PLELog.data.Embedding import * from logadempirical.PLELog.data.DataLoader import * import logging import json
class NumpyEncoder(json.JSONEncoder): """ Special json encoder for numpy types """ def default(self, obj): if isinstance(obj, (np.int, np.intc, np.intp, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64)): return int(obj) elif isinstance(obj, (np.float, np.float16, np.float32, np.float64)): return float(obj) elif isinstance(obj, (np.ndarray,)): return obj.tolist() return json.JSONEncoder.default(self, obj)
logger = logging.getLogger('embedding') logger.setLevel(logging.INFO) dataset = 'bgl' save_path = './dataset/bgl' templatesDir = './dataset/bgl' log_file = 'BGL_all.log' logID2Temp, templates = load_templates_from_structured(templatesDir, logger, dataset, log_file=log_file) templateVocab = nlp_emb_mergeTemplateEmbeddings_BGL(save_path, templates, dataset, logger)
with open(os.path.join(save_path, 'templates_BGL.vec'), 'r', encoding='utf-8') as reader: templateVocab = {} line_num = 0 for line in reader.readlines(): if line_num == 0: vocabSize, embedSize = [int(x) for x in line.strip().split()] else: items = line.strip().split() if len(items) != embedSize + 1: continue template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64) for logID, temp in logID2Temp.items(): if temp == template_word: templateVocab[logID] = template_embedding line_num += 1 replica_logIDs = [] for logId in logID2Temp.keys(): if logID not in templateVocab.keys(): replica_logIDs.append(logID)
for logID in replica_logIDs:
temp = logID2Temp[logID]
line_num = 0
for line in reader.readlines():
if line_num == 0:
vocabSize, embedSize = [int(x) for x in line.strip().split()]
else:
items = line.strip().split()
if len(items) != embedSize + 1: continue
template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64)
if temp == template_word:
templateVocab[logID] = template_embedding
line_num += 1
with open(os.path.join(save_path, 'embeddings.json'), 'w') as writer: json.dump(templateVocab, writer, cls=NumpyEncoder)
是的 - 我也找不到你提到的文件...... 我已经通过这段代码成功生成了 embedding.json。希望对您有所帮助! https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py < https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py> …… -- Dr. Stephen C. Hayne, Professor Emeritus (CSU) Cyber Security and Information Systems Consultant I love to fly formation! Nanchang - 443LM "http://selfsynchronize.com/hayne/plane/" On Thu, Apr 6, 2023 at 7:41 PM pupuu555 @.> wrote: hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKIS2PHXIS4WRBZDJVNEWLW75WCZANCNFSM5VZAXOPQ . You are receiving this because you were mentioned.Message ID: @.>
Thank you sooooo much!!!!!!
您好,下面的代码是我用来生成 embedding.json 的。希望能帮助到你!
from logadempirical.PLELog.data.Embedding import * from logadempirical.PLELog.data.DataLoader import * import logging import json class NumpyEncoder(json.JSONEncoder): """ Special json encoder for numpy types """ def default(self, obj): if isinstance(obj, (np.int_, np.intc, np.intp, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64)): return int(obj) elif isinstance(obj, (np.float_, np.float16, np.float32, np.float64)): return float(obj) elif isinstance(obj, (np.ndarray,)): return obj.tolist() return json.JSONEncoder.default(self, obj) # Specify logger logger = logging.getLogger('embedding') logger.setLevel(logging.INFO) dataset = 'bgl' save_path = './dataset/bgl' templatesDir = './dataset/bgl' log_file = 'BGL_all.log' logID2Temp, templates = load_templates_from_structured(templatesDir, logger, dataset, log_file=log_file) templateVocab = nlp_emb_mergeTemplateEmbeddings_BGL(save_path, templates, dataset, logger) with open(os.path.join(save_path, 'templates_BGL.vec'), 'r', encoding='utf-8') as reader: templateVocab = {} line_num = 0 for line in reader.readlines(): if line_num == 0: vocabSize, embedSize = [int(x) for x in line.strip().split()] else: items = line.strip().split() if len(items) != embedSize + 1: continue template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64) for logID, temp in logID2Temp.items(): if temp == template_word: templateVocab[logID] = template_embedding line_num += 1 replica_logIDs = [] for logId in logID2Temp.keys(): if logID not in templateVocab.keys(): replica_logIDs.append(logID) # 有重复的template for logID in replica_logIDs: temp = logID2Temp[logID] line_num = 0 for line in reader.readlines(): if line_num == 0: vocabSize, embedSize = [int(x) for x in line.strip().split()] else: items = line.strip().split() if len(items) != embedSize + 1: continue template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64) if temp == template_word: templateVocab[logID] = template_embedding line_num += 1 with open(os.path.join(save_path, 'embeddings.json'), 'w') as writer: json.dump(templateVocab, writer, cls=NumpyEncoder)
Thank you sooooo much!!!!好人一生平安
Hi, the code following is my used to generate embedding.json. Hope it helps!
from logadempirical.PLELog.data.Embedding import * from logadempirical.PLELog.data.DataLoader import * import logging import json class NumpyEncoder(json.JSONEncoder): """ Special json encoder for numpy types """ def default(self, obj): if isinstance(obj, (np.int_, np.intc, np.intp, np.int8, np.int16, np.int32, np.int64, np.uint8, np.uint16, np.uint32, np.uint64)): return int(obj) elif isinstance(obj, (np.float_, np.float16, np.float32, np.float64)): return float(obj) elif isinstance(obj, (np.ndarray,)): return obj.tolist() return json.JSONEncoder.default(self, obj) # Specify logger logger = logging.getLogger('embedding') logger.setLevel(logging.INFO) dataset = 'bgl' save_path = './dataset/bgl' templatesDir = './dataset/bgl' log_file = 'BGL_all.log' logID2Temp, templates = load_templates_from_structured(templatesDir, logger, dataset, log_file=log_file) templateVocab = nlp_emb_mergeTemplateEmbeddings_BGL(save_path, templates, dataset, logger) with open(os.path.join(save_path, 'templates_BGL.vec'), 'r', encoding='utf-8') as reader: templateVocab = {} line_num = 0 for line in reader.readlines(): if line_num == 0: vocabSize, embedSize = [int(x) for x in line.strip().split()] else: items = line.strip().split() if len(items) != embedSize + 1: continue template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64) for logID, temp in logID2Temp.items(): if temp == template_word: templateVocab[logID] = template_embedding line_num += 1 replica_logIDs = [] for logId in logID2Temp.keys(): if logID not in templateVocab.keys(): replica_logIDs.append(logID) # 有重复的template for logID in replica_logIDs: temp = logID2Temp[logID] line_num = 0 for line in reader.readlines(): if line_num == 0: vocabSize, embedSize = [int(x) for x in line.strip().split()] else: items = line.strip().split() if len(items) != embedSize + 1: continue template_word, template_embedding = items[0], np.asarray(items[1:], dtype=np.float64) if temp == template_word: templateVocab[logID] = template_embedding line_num += 1 with open(os.path.join(save_path, 'embeddings.json'), 'w') as writer: json.dump(templateVocab, writer, cls=NumpyEncoder)
hi,When i ran the file you gave me,I met a new issue: FileNotFoundError: [Errno 2] No such file or directory: 'dataset/nlp-word.vec',how can i get the nlp-word.vec? I don't find a way to genearate this file in the code.
是的 - 我也找不到你提到的文件...... 我已经通过这段代码成功生成了 embedding.json。希望对您有所帮助! https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py < https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py> …… -- Dr. Stephen C. Hayne, Professor Emeritus (CSU) Cyber Security and Information Systems Consultant I love to fly formation! Nanchang - 443LM "http://selfsynchronize.com/hayne/plane/" On Thu, Apr 6, 2023 at 7:41 PM pupuu555 @.**> wrote: hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKIS2PHXIS4WRBZDJVNEWLW75WCZANCNFSM5VZAXOPQ . You are receiving this because you were mentioned.Message ID: @.**>
Thank you sooooo much!!!!!!
可以加个微信吗?我最近被这个项目搞得头都快炸掉了,拜托拜托,我的微信是:RainyloveStatic
是的 - 我也找不到你提到的文件...... 我已经通过这段代码成功生成了 embedding.json。希望对您有所帮助! https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py < https://github.com/xichie/LogADEmpirical/blob/master/generate_template_embedding.py> …… -- Dr. Stephen C. Hayne, Professor Emeritus (CSU) Cyber Security and Information Systems Consultant I love to fly formation! Nanchang - 443LM "http://selfsynchronize.com/hayne/plane/" On Thu, Apr 6, 2023 at 7:41 PM pupuu555 @.**> wrote: hi,have you add the code to generate embeddings in this repository? I haven't find the file, could you please tell me how to generate the embeddings?Or give me the embeddings.json? Thank you so much!! — Reply to this email directly, view it on GitHub <#7 (comment)>, or unsubscribe https://github.com/notifications/unsubscribe-auth/AEKIS2PHXIS4WRBZDJVNEWLW75WCZANCNFSM5VZAXOPQ . You are receiving this because you were mentioned.Message ID: @.**>
Thank you sooooo much!!!!!!
可以加个微信吗?我最近被这个项目搞得头都快炸掉了,拜托拜托,我的微信是:RainyloveStatic You can download nlp-word.vec by following: https://dl.fbaipublicfiles.com/fasttext/vectors-english/wiki-news-300d-1M.vec.zip
I'm trying to reproduce your results (like another poster here)...
Perhaps a silly question, but after downloading the HDFS and BGL datasets, running them through Drain, I'm now getting this error - can you advise how/where to get your "embeddings.json" file?