gaiusyu / Brain

Brain: Log Parsing with Bidirectional Parallel Tree
Apache License 2.0
27 stars 6 forks source link

During handling of the above exception, another exception occurred: MemoryError #3

Open 2718455213wcx opened 1 year ago

2718455213wcx commented 1 year ago

I'm running out of memory when I'm running the Thunderbird dataset(29.8gb) with Logpai's brain, is there a way to solve it?

Traceback (most recent call last): File "..\logparser\Brain.py", line 189, in tuple_generate result = number.most_common() File "C:\Users\3730.conda\envs\logbert\lib\collections__init__.py", line 610, in most_common

2718455213wcx commented 1 year ago

I'm getting a valuerror when running the thunderbird dataset in the brain ValueError: Length of values (2000000) does not match length of index (2000)

2718455213wcx commented 1 year ago

Traceback (most recent call last): File "E:\Brain-main\Code\evaluate.py", line 53, in df_output, template_set = Brain.parse(sentences, setting['regex'], dataset, setting['theshold'], File "E:\Brain-main\Code\Brain\Brain.py", line 319, in parse dfexample['EventTemplate']=template

gaiusyu commented 1 year ago

Hello! Thanks for your interest in our work. You may split your large dataset into mutilple small chunks to solve memory issue. Since I have used Brain as a preprocessing tool in large log file compression, I'm sure that Brain can perform well using this method.

gaiusyu commented 1 year ago

This reported issue seems not very clear. Could you please provide more context?

gaiusyu commented 1 year ago

Investigating the relevant code sections carefully, I understand why it went wrong. You are using the code of this repositry. This issue is caused by the design of Brain.parse() → df_example = df_input → dfexample['EventTemplate']=template. template_ is a parsed result (2000000) and dfinput is ground truth (2000). You need to create a new dataframe to store the parsed results (i.e., template & EventID) to resolve this error. I designed this to make it easier to obtain accuracy results on the 2K benchmark dataset, and I apologize for any errors this may cause😂😂

gaiusyu commented 1 year ago

Your may refer to Brain in LOGPAI, self.df_log is a dataframe to save parsed results

2718455213wcx commented 1 year ago

Thanks, I'll modify the code according to your suggestion